ecuguru has asked for the wisdom of the Perl Monks concerning the following question:

#Hi!
I'd like to have perl watching a folder and when a file shows up, to use it as an input and perform operations on it. The users would Sftp directly to that folder.

My problem is, when the file is still uploading to the directory (multi-gig files) the perl script is reading the directory, sees the start of the file, and tries to perform operations on it.

So my Q is:
Can I have the directory list exclude files still uploading?
or
Can I detect that a file is still uploading?
or
Should I change my topology?
Thanks!!

Replies are listed 'Best First'.
Re: Monitor Folder, Wait for Upload?
by archfool (Monk) on Jul 13, 2007 at 15:48 UTC
    The safest way is to upload into a temp directory and move it when done.

    Failing that, you can use the "-s" operator (or stat) to determine file size over time. If over N seconds, it hasn't grown any, it's probably finished uploading.

Re: Monitor Folder, Wait for Upload?
by tirwhan (Abbot) on Jul 13, 2007 at 16:08 UTC

    If you're on *NIX you can use the lsof utility to see whether there is an active process still writing to the file. For example, to wait until the file is not being accessed anymore:

    while (`lsof filename` =~ m/upload\.cgi.*?\s\d+w\s/) { sleep 10; }

    All dogma is stupid.
Re: Monitor Folder, Wait for Upload?
by clinton (Priest) on Jul 14, 2007 at 08:49 UTC
    If you're running Linux >= 2.6.13, you could try using Linux::Inotify to put a watch on the folder.

    When there is an IN_CREATE event, put a watch on the created file, and when your receive an IN_CLOSE event on the file, you know that the upload has finished.

    Untested

    Clint

Re: Monitor Folder, Wait for Upload?
by saintly (Scribe) on Jul 13, 2007 at 17:59 UTC
    Try locking the file with flock. With any luck, your sftp daemon software has obtained a lock on the file while it's writing to it. You can either have flock block so that your program pauses and resumes when the file is finished uploading or not block flock (I just like saying that) and have your program immediately abort (or go back to sleep or whatever) to try again later. Typical usage:
    $lockOK = flock($filehandle,2); # Resumes program flow when you obta +ined the lock # or $lockOK = flock($filehandle,6); # Check return value to see if you w +ere able to lock the file # ..... flock($filehandle,8); # Release lock when done
    If for some bizarre reason the SFTP upload process doesn't bother locking the files it's writing to, you should email the author of that daemon to suggest it.
    Edit: Crap, it seems you're correct mr salva, sftp (and even the ancient 'ftp') don't bother flocking at all. I downloaded the source for the latest version of openssh I could find (4.6), and grepped for 'flock', and it would seem it never uses it. This solution is useless for your purpose then, unless you have a better SFTP daemon than OpenSSH.
      With any luck, your sftp daemon software has obtained a lock on the file while it's writing to it

      Don't expect that, that's not how Unix work!

      Latest version of the SFTP protocol has support for lock and unlock operations, but the version 3 used by OpenSSH, that's probably the implementation most extended out there, doesn't.