dragonchild has asked for the wisdom of the Perl Monks concerning the following question:

I've been given the unenviable task of determining when a file stops growing. Under Unix, that's easy. -s, -M, (stat)[9], File::Modified, etc.

Windows? Hah! If you're copying a file in Windows, -s $dest returns the full size and -M $dest (or (stat $dest)[9]) returns the last access time for the source!

I'd really like to avoid reading in the entire file and doing a length check on the bytes that are there, if possible. Some of the files I'll be working with are 500Gig. Also, some of the files could be binary, so I can't just count the number of newlines.

Has anyone figured out a solution for this?

------
We are the carpenters and bricklayers of the Information Age.

The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6

Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

Replies are listed 'Best First'.
Re: Those unreliable file stat operations in Win32
by dragonchild (Archbishop) on Sep 30, 2003 at 18:30 UTC
    Thanks to all who responded. The answered ended up being that flock requires an open filehandle. Windows won't allow a filehandle to be opened if the file hasn't been fully written. (Something about copying to a temp file and renaming.)

    The following script will work.

    (Update: Removed the flock check because it isn't needed and will hang on the Samba share I'm testing with.)

    #!/usr/bin/perl use strict; use warnings; use IO::File; my ($fn) = @ARGV; die "Must pass a filename\n" unless defined $fn && length $fn; my %checks = ( exist => 10, age => 3, fopen => 20, ); my $std_wait = 2; my %wait = ( exist => $std_wait, age => $std_wait, fopen => $std_wait, ); my $file_exists = -e $fn; my $exist_checks = $checks{exist}; while (!$file_exists || --$exist_checks) { sleep $wait{exist}; $file_exists = -e $fn; last if $file_exists; } unless ($exist_checks) { die "File '$fn' doesn't exist in $checks{exist} checks\n"; } my $old_mod = (stat $fn)[9]; my $num_checks = $checks{age}; while ($num_checks) { sleep $wait{age}; my $current_mod = (stat $fn)[9]; if ($old_mod == $current_mod) { $num_checks--; next; } $num_checks = $checks{age}; $old_mod = $current_mod; } my $fh; my $fopen_checks = $checks{fopen}; while (--$fopen_checks) { $fh = IO::File->new(">>$fn") and last; sleep $wait{fopen}; } unless ($fopen_checks) { die "Could not open '$fn' for reading in $checks{fopen} checks +\n"; } $fh->close; print "File has fully arrived!\n";

    ------
    We are the carpenters and bricklayers of the Information Age.

    The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6

    Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

      I see you have a solution, so this is probably irrelevant, but I did discover an interesting possibility whilst playing with this. It would only work if the volume receiving the output file has the compressed atribute enabled.

      There is another API, GetCompressedFileSize(), which returns the actual storage requiements rather than the logical size, for compressed or sparse files. I just tried this to see what values I get from it whilst copying a 1 GB logical/500k actual file on a compressed volume and discovered that it reports the full filesize (1GB) immediately after the destination file is opened and then the reported size slowly decreases over time until the file is closed at which point it returns the final compressed size.

      Whether this is any easier than just trying to open the file I doubt, but I found it interesting anyway.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
      If I understand your problem, I can solve it! Of course, the same can be said for you.

Re: Those unreliable file stat operations in Win32
by demerphq (Chancellor) on Sep 30, 2003 at 16:52 UTC

    Er, im not real sure if you are right about this (I havent checked but it doesnt square with my feeble and fallable memory.) But if you are then I would try to get a lock on the file. Afaik, Win32 will lock the file while copying so if you can get an exclusive lock on the file it must be done. IME, mtime is safe to use, ill have to look into this deeper and see where the discrepancy comes from.


    ---
    demerphq

      First they ignore you, then they laugh at you, then they fight you, then you win.
      -- Gandhi


Re: Those unreliable file stat operations in Win32
by BrowserUk (Patriarch) on Sep 30, 2003 at 18:33 UTC

    The problem arises because NTFS preallocates the space for the whole file in one hit if it knows the final size, which it obviously does when using the CopyFile() API or an application that resolves to it. This is what allows the OS to report "There is not enough space on the disk." before it actually starts transfering data.

    Perhaps the simplest way of determining when the copy is finished would be to loop over trying to open the file until you suceed. If the application doing the copying cooperates by opening it's output file with an exclusive lock, this will tell you when the file has been closed and the copy completed.

    If the application doing the copying isn't being nice and opening the output file exclusively, then you could use the native API OpenFile (via Win32::API::Prototype or similar) to try and open the file OF_SHARE_EXCLUSIVE yourself. Once this succeeds, the copy has finished.

    You might also try looking at the functions Win32::File to see if one of those will allow you to attempt an exclusive open of the file, but I've never managed to decypher what's going on in there -- the pod is scary:)


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
    If I understand your problem, I can solve it! Of course, the same can be said for you.

Re: Those unreliable file stat operations in Win32
by runrig (Abbot) on Sep 30, 2003 at 17:01 UTC
    You can try using flock. I don't know if Windows locks a file while copying or if it depends on Windows version, but if it does lock, then you just wait till you can get a lock on the file. (My initial test copying a 62MB file says yes it does lock the destination file while copying on XP).
Re: Those unreliable file stat operations in Win32
by Aragorn (Curate) on Sep 30, 2003 at 17:32 UTC
    I have no Perl on Windows experience, so I could well be talking complete nonsense, but maybe it's possible to open the file, fseek to the (current?) end and use ftell to get the current position. Repeat using a sensible pause.

    Arjen

Re: Those unreliable file stat operations in Win32
by Thelonius (Priest) on Sep 30, 2003 at 18:41 UTC
    I just tested stat on Windows 2000/ActivePerl 5.8.0 and it worked fine. Maybe you should show a little code and we can figure out why it's not working. If you're really working with 500Gig files, not 500MB, I can see that there could be a difference since that's > 32 bits!!
      The problem is that the stat operator will lie when the file is still being copied. They will give the values for the source, not the destination. Check the code on my scratchpad for an example. (Run on both Windows and Unix for a comparison.)

      ------
      We are the carpenters and bricklayers of the Information Age.

      The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6

      Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.