Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I need to implement the following:
I need to treat a directory as a file queue. End-users will continuously put files into the directory. File size will be very large. I will have a background process written in Perl, to continuously copy files from that directory to some other directories.
The PROBLEM:
How can I ensure I only copy file which is a complete file? Because the file size is very large, I might copy some file which is still in the process of being copied from End-users' directory to that directory. I am thinking that I can 'seek' to the end of the file, check if condition eof() is true or not. If true, then this is a complete file, if not, then just skip dealing with this file. I don't know if this method would work.

Please help, thanks!

Replies are listed 'Best First'.
Re: Treat a directory as file queue
by tommyw (Hermit) on Aug 16, 2002 at 20:16 UTC

    The problem is that the file is always "complete". It's just that it may be more "complete" a moment later. So if you seek to the end, and then check for EOF as you suggest, it'll probably return true. It won't if the file has actually grown in between the seek and the test, but you may be testing too quickly. As the others suggest, you could include some delay, but there's the risk that the delay won't be enough.

    However, if that's your decision... rather than than actually opening the files and making some decision base on the content, why not just look at the timestamp on the files, and consider them complete if they're old enough?

    time-(stat $file)[9]
    will tell you how many seconds since it was last written (note that you can't use -M, since that's based on the start time of the program, not the current time).

    A better option, if you're on a UN*X system would be to use fuser on the file:

    system "fuser -s $file"; print $?>>8?"complete":"still open";

    --
    Tommy
    Too stupid to live.
    Too stubborn to die.

Re: Treat a directory as file queue
by Zaxo (Archbishop) on Aug 16, 2002 at 19:54 UTC

    On unix, you may have the lsof utility available.

    Update: Also see flock

    After Compline,
    Zaxo

Re: Treat a directory as file queue
by krisahoch (Deacon) on Aug 16, 2002 at 18:41 UTC
    How can I ensure I only copy file which is a complete file? Because the file size is very large, I might copy some file which is still in the process of being copied from End-users' directory to that directory.

    If you are incontrol of the upload process, there is a quick and easy way that I'd do it. NOTE: This is not tested, only a theory. There is an old way of doing things in C called flags. Since you are in control of the copying,

    1. set a flag that references a file that is going to be uploaded.
    2. Start the copy
    3. unset the flag .5 seconds after the copy returns successful.
      • If unsuccessful, do your own error checking thing

    OR

    Do a md5sum on both files every few seconds until the sums match. The md5sum is much much slower.

    Just a thought

    Kristofer

Re: Treat a directory as file queue
by dws (Chancellor) on Aug 16, 2002 at 18:32 UTC
    How can I ensure I only copy file which is a complete file?

    Assuming that you have no control over the agent who is copying the file into the directory, one approach is to monitor the file size, and consider the transfer to be complete after some fixed amount of time.

Re: Treat a directory as file queue
by fglock (Vicar) on Aug 16, 2002 at 21:00 UTC

    Your users could save the files as "filename.PART", and then rename the files to just "filename" when they are ready.

    This is the method used by most download managers.

      Thanks for all the replies. Currently end-user will copy files from Macintosh to a directory in Win2K, which is mounted by Linux using Samba. Under such circumstances, does it mean I have to choose a time range to determine if it is whole file? Does it mean there's no alternatives for this?
Re: Treat a directory as file queue
by kingman (Scribe) on Aug 16, 2002 at 20:17 UTC
    On unix, you could also just symlink the directory to the other places where you need the data. I'm kind of wondering why you need to copy huge files around. You might want to look into the rsync utility too.