python - Redirected output from a subprocess call getting lost? -
i have python code goes this, using libraries may or may not have:
# open writing vcf_file = open(local_filename, "w") # download region file. subprocess.check_call(["bcftools", "view", options.truth_url.format(sample_name), "-r", "{}:{}-{}".format(ref_name, ref_start, ref_end)], stdout=vcf_file) # close parent process's copy of file object vcf_file.close() # upload file_id = job.filestore.writeglobalfile(local_filename) basically, i'm starting subprocess that's supposed go download data me , print standard out. i'm redirecting data file, , then, subprocess call returns, i'm closing handle file , copying file elsewhere.
i'm observing that, sometimes, tail end of data i'm expecting isn't making copy. now, it's possible bcftools not writing data, i'm worried might doing unsafe , somehow getting access file after subprocess.check_call() has returned, before data child process writes standard output makes onto disk can see it.
looking @ c standard (since bcftools implemented in c/c++), looks when program exits normally, open streams (including standard output) flushed , closed. see [lib.support.start.term] section here, describing behavior of exit(), called implicitly when main() returns:
--next, open c streams (as mediated function signatures declared in ) unwritten buffered data flushed, open c streams closed, , files created calling tmp- file() removed.30)
--finally, control returned host environment. if status 0 or exit_success, implementation-defined form of status successful termination returned. if status exit_failure, implementation-defined form of status unsuccessful termination returned. otherwise status returned implementation-defined.31)
so before child process exits, closes (and flushes) standard output.
however, manual page linux close(2) notes closing file descriptor not guarantee data written has made disk:
a successful close not guarantee data has been saved disk, kernel defers writes. not common filesystem flush buffers when stream closed. if need sure data physically stored, use fsync(2). (it depend on disk hardware @ point.)
thus, appear that, when process exits, standard output stream flushed, if stream backed file descriptor pointing file on disk, write disk not guaranteed have completed. suspect that may going on here.
so, actual questions:
is reading of specs correct? can child process appear parent have terminated before redirected standard output available on disk?
is possible somehow wait until data written child process files has been synced disk os?
should calling
flush()or python version offsync()on parent process's copy of file object? can force writes same file descriptor child processes committed disk?
yes, there minutes before data written disk (physically). can read long before that.
unless worrying power failure or kernel panic; doesn't matter whether data on disk. important part whether kernel thinks data written.
it safe read file check_call() returns. if don't see data; may indicate bug in bcftools or writeglobalfile() doesn't upload data file. try workaround former disabling block-buffering mode bsftools' stdout (provide pseudo-tty, use unbuffer command-line utility, etc).
q: reading of specs correct? can child process appear parent have terminated before redirected standard output available on disk?
yes. yes.
q: possible somehow wait until data written child process files has been synced disk os?
no. fsync() not enough in general case. likely, don't need anyway (reading data different issue, making sure written disk).
q: should calling flush() or python version of fsync() on parent process's copy of file object? can force writes same file descriptor child processes committed disk?
it pointless. .flush() flushes buffers internal parent process (you can use open(filename, 'wb', 0) avoid creating unnecessary buffers in parent).
fsync() works on file descriptor (the child has own file descriptor). don't know whether kernel uses different buffers different file descriptors referring same disk file. again, doesn't matter -- if observe data missing (no-crashes); fsync() won't here.
q: clear, see you're asserting data should indeed readable other processes, because relevant os buffers shared between processes. what's source assertion? there place in spec or linux documentation can point guarantees buffers shared?
look "after write() regular file has returned":
any successful
read()each byte position in file modified write shall return data specifiedwrite()position until such byte positions again modified.
Comments
Post a Comment