Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

We have a remote process logging into our system and writing files to a specific directory. Will the values supplied by File::Stat (mtime, size, etc.) exist and be *constantly* updated during the time the file is being written to our disk by the remote process? I'd like to be able to compare mtime or size against a prior call to "time" or "stat" to see if the file is currently being written. I am aware of the lsof and fuser commands, but I'd like to use something a little more portable and I think comparing the instantaneous modification time or file size might do the trick.
  • Comment on Do the File::Stat values Update While a File is Being Written?

Replies are listed 'Best First'.
Re: Do the File::Stat values Update While a File is Being Written?
by CountZero (Bishop) on Jul 14, 2008 at 20:23 UTC
    File::Stat (or File::stat) are just OO-wrappers around the CORE stat and lstat functions and provide easier access to the results of a regular (l)stat call.

    Thus they exhibit the same behaviour of stat or lstat. As these CORE functions rely on what your OS provides, it will depend on your OS whether the data returned by the internal "stat" has any reliable value as long as the file is still being written.

    As a bold guess, I would dare to say that none of its values can be fully relied upon until the file being written is "closed".

    But why don't you just try it? Write a script that writes (slowly) to a file during a few minutes. Start another script that stats this file repeatedly during this period and reports on the values found.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: Do the File::Stat values Update While a File is Being Written?
by Joost (Canon) on Jul 14, 2008 at 21:01 UTC
    This all depends on what you mean by "being written". On POSIXy systems, syswrite, mtime & friends should be regarded as atomic (but probably not recovering from errors), so any OS-level write should update mtime, filesize etc - regardless of if the write is actually written to disk yet.

    OS-level caching is completely transparent to these operations. Which also means that there's no such thing as "being written" - the data has either been written (given no errors occurred), or none of it has. From the POV of these operations there is no in between.

    For instance:

    #!/usr/local/bin/perl -w use strict; open F,">/tmp/test" or die $!; for (1 .. 1000) { syswrite F,"."; my $s = -s F; warn "$s != $_" if $s != $_; }
    The code above reports no anomalies on my system.

    Confusion may set in because perl does some additional caching by itself (when using the canonical print(f) function):

    #!/usr/local/bin/perl -w use strict; open F,">/tmp/test" or die $!; for (1 .. 1000) { print F "."; my $s = -s F; warn "$s != $_" if $s != $_; }
    Gives:
    0 != 1 at test.pl line 7. 0 != 2 at test.pl line 7. 0 != 3 at test.pl line 7.
    etc etc.

    See also syswrite.

    updated to fix typo in second example

    update 2: bottom line:

    If your remote program sends write()s to the system often enough, and/or your peeking of mtime etc is infrequent enough, your strategy should work. But any program doing fairly heavy caching by itself (and I would guess many interesting programs do) may throw it off. So be conservative when choosing your intervals.

Re: Do the File::Stat values Update While a File is Being Written?
by kyle (Abbot) on Jul 14, 2008 at 20:16 UTC

    Could your remote process flock the file as it's writing?

    The stuff returned by stat should be up-to-date, but it might only be from the last buffer write. That is, if you're writing to the file very slowly (so that data is buffered without being written for a long time), the mtime will be updated infrequently.

Re: Do the File::Stat values Update While a File is Being Written?
by jethro (Monsignor) on Jul 14, 2008 at 20:46 UTC
    Another method could be to seek to the end of the file and check the position, but this too will be influenced by buffering

      If you check with a higher frequency than the buffer is flushed, you will get wrong results and will never be certain when the file is "finished".

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: Do the File::Stat values Update While a File is Being Written?
by swampyankee (Parson) on Jul 15, 2008 at 02:36 UTC

    The answer will depend on the o/s and how it's been tuned. I suspect (suspect is one step above wild-assed guess) that a file that's being overwritten won't be updated until the write operation is completed, so that a communication failure won't leave a corrupted file. For a new file, the size and modification information will be updated when the write-buffer is flushed.

    The answer will depend on the o/s, how it's been tuned, and the details of how the file transfer protocol is managed at the receiving end.


    Information about American English usage here and here. Floating point issues? Please read this before posting. — emc

Re: Do the File::Stat values Update While a File is Being Written?
by Anonymous Monk on Jul 15, 2008 at 17:13 UTC
    Many, many thanks to all of you for your assistance. I have posted questions here off and on and almost always received the help I needed.

    I think I'm just going to break down and use "lsof" after all. It just seems to be the most reliable way to get this job done.