michael.barnes has asked for the wisdom of the Perl Monks concerning the following question:

I have a small script that basically watches a folder. When a file is put into the folder, the script moves it to a different folder based on the file suffix. Randomly, the script will crash. I have run strace on the script and found what I believe is the crash point, but I don't know what to do about it.

Here is the relevant portion of the script:
foreach $wavfile (@wavfile) { $newwavfile = $wavfile; $newwavfile =~ s/\/mnt\/xfer/\/mnt\/Audio\/dropbox\/wa +v/; move ($wavfile, $newwavfile) or die "Holy moly we got +us a wav error here hoss!"; }
Here is an strace segment where the crash occurs:
19:51:13.072532 stat64("/mnt/Audio/dropbox/wav/31650_WRETCHED Radio 04 +-25_DIST.wav", 0x9a670c8) = -1 ENOENT (No such file or directory) 19:51:13.073455 stat64("/mnt/Audio/dropbox/wav/31650_WRETCHED Radio 04 +-25_DIST.wav", 0x9a670c8) = -1 ENOENT (No such file or directory) 19:51:13.073558 stat64("/mnt/xfer/31650_WRETCHED Radio 04-25_DIST.wav" +, {st_mode=S_IFREG|S_ISGID|S_ISVTX|0767, st_size=280369278, ...}) = 0 19:51:13.073722 rename("/mnt/xfer/31650_WRETCHED Radio 04-25_DIST.wav" +, "/mnt/Audio/dropbox/wav/31650_WRETCHED Radio 04-25_DIST.wav") = -1 +EXDEV (Invalid cross-device link) 19:51:13.073837 stat64("/mnt/xfer/31650_WRETCHED Radio 04-25_DIST.wav" +, {st_mode=S_IFREG|S_ISGID|S_ISVTX|0767, st_size=280369278, ...}) = 0 19:51:13.073960 stat64("/mnt/Audio/dropbox/wav/31650_WRETCHED Radio 04 +-25_DIST.wav", 0x9a670c8) = -1 ENOENT (No such file or directory) 19:51:13.074157 stat64("/mnt/xfer/31650_WRETCHED Radio 04-25_DIST.wav" +, {st_mode=S_IFREG|S_ISGID|S_ISVTX|0767, st_size=280369278, ...}) = 0 19:51:13.074293 stat64("/mnt/Audio/dropbox/wav/31650_WRETCHED Radio 04 +-25_DIST.wav", 0x9a670c8) = -1 ENOENT (No such file or directory) 19:51:13.074396 stat64("/mnt/Audio/dropbox/wav/31650_WRETCHED Radio 04 +-25_DIST.wav", 0x9a670c8) = -1 ENOENT (No such file or directory) 19:51:13.074532 open("/mnt/xfer/31650_WRETCHED Radio 04-25_DIST.wav", +O_RDONLY|O_LARGEFILE) = -1 ETXTBSY (Text file busy) 19:51:14.087239 unlink("/mnt/Audio/dropbox/wav/31650_WRETCHED Radio 04 +-25_DIST.wav") = -1 ENOENT (No such file or directory) 19:51:14.088126 write(2, "Holy moly we got us a wav error "..., 83) = +83 19:51:14.106966 exit_group(26) = ?

The attempt to open the file and the "-1 ETXTBSY (Text file busy)" error seems to be the crash point.



Here is an strace segment where the script works properly:
19:52:36.606041 stat64("/mnt/Audio/dropbox/wav/31650_WRETCHED Radio 04 +-25_DIST.wav", 0x94190c8) = -1 ENOENT (No such file or directory) 19:52:36.606938 stat64("/mnt/Audio/dropbox/wav/31650_WRETCHED Radio 04 +-25_DIST.wav", 0x94190c8) = -1 ENOENT (No such file or directory) 19:52:36.607039 stat64("/mnt/xfer/31650_WRETCHED Radio 04-25_DIST.wav" +, {st_mode=S_IFREG|S_ISGID|S_ISVTX|0767, st_size=280369278, ...}) = 0 19:52:36.607181 rename("/mnt/xfer/31650_WRETCHED Radio 04-25_DIST.wav" +, "/mnt/Audio/dropbox/wav/31650_WRETCHED Radio 04-25_DIST.wav") = -1 +EXDEV (Invalid cross-device link) 19:52:36.607336 stat64("/mnt/xfer/31650_WRETCHED Radio 04-25_DIST.wav" +, {st_mode=S_IFREG|S_ISGID|S_ISVTX|0767, st_size=280369278, ...}) = 0 19:52:36.607460 stat64("/mnt/Audio/dropbox/wav/31650_WRETCHED Radio 04 +-25_DIST.wav", 0x94190c8) = -1 ENOENT (No such file or directory) 19:52:36.607680 stat64("/mnt/xfer/31650_WRETCHED Radio 04-25_DIST.wav" +, {st_mode=S_IFREG|S_ISGID|S_ISVTX|0767, st_size=280369278, ...}) = 0 19:52:36.607818 stat64("/mnt/Audio/dropbox/wav/31650_WRETCHED Radio 04 +-25_DIST.wav", 0x94190c8) = -1 ENOENT (No such file or directory) 19:52:36.607920 stat64("/mnt/Audio/dropbox/wav/31650_WRETCHED Radio 04 +-25_DIST.wav", 0x94190c8) = -1 ENOENT (No such file or directory) 19:52:36.608058 open("/mnt/xfer/31650_WRETCHED Radio 04-25_DIST.wav", +O_RDONLY|O_LARGEFILE) = 3 19:52:36.608888 ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbfdbb698) = +-1 ENOTTY (Inappropriate ioctl for device) 19:52:36.608963 _llseek(3, 0, [0], SEEK_CUR) = 0 19:52:36.609032 fstat64(3, {st_mode=S_IFREG|S_ISGID|S_ISVTX|0767, st_s +ize=280369278, ...}) = 0 19:52:36.609131 fcntl64(3, F_SETFD, FD_CLOEXEC) = 0 19:52:36.609225 open("/mnt/Audio/dropbox/wav/31650_WRETCHED Radio 04-2 +5_DIST.wav", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 4 19:52:36.609739 ioctl(4, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbfdbb698) = +-1 ENOTTY (Inappropriate ioctl for device) 19:52:36.609837 _llseek(4, 0, [0], SEEK_CUR) = 0 19:52:36.609905 fstat64(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 19:52:36.610003 fcntl64(4, F_SETFD, FD_CLOEXEC) = 0 19:52:36.610077 fstat64(3, {st_mode=S_IFREG|S_ISGID|S_ISVTX|0767, st_s +ize=280369278, ...}) = 0 19:52:36.610200 mmap2(NULL, 2101248, PROT_READ|PROT_WRITE, MAP_PRIVATE +|MAP_ANONYMOUS, -1, 0) = 0xb7d07000 19:52:36.610284 read(3, "RIFFv\30\266\20WAVEfmt \20\0\0\0\1\0\1\0D\254 +\0\0\210X\1\0"..., 2097152) = 2097152 19:52:36.840512 write(4, "RIFFv\30\266\20WAVEfmt \20\0\0\0\1\0\1\0D\25 +4\0\0\210X\1\0"..., 2097152) = 2097152 19:52:36.842789 read(3, "\331\v\20\v\35\v\235\v\377\vW\f\311\fo\rT\16P +\17e\20\214\21U\22P\22\206\21j\20"..., 2097152)= 2097152 19:52:37.055588 write(4, "\331\v\20\v\35\v\235\v\377\vW\f\311\fo\rT\16 +P\17e\20\214\21U\22P\22\206\21j\20"..., 2097152) = 2097152

This time the attempt to open returns a '3' and the script proceeds to write the file to the new location.

The files vary from 5-50MB in size, but that does not seem to matter.

Any ideas on why this is happening and what I can do to fix it?


Thanks,
Michael

Replies are listed 'Best First'.
Re: Random script crashes related to File::Copy (care)
by tye (Sage) on Apr 29, 2009 at 16:09 UTC

    roboticus, no, the "Invalid cross-device link" error is caught by File::Copy::move() and causes it to try to copy the file and then unlink.

    runrig, no, "trying to delete the file while it is still being written to" is not in evidence in the traces provided. The trace shows the destination being unlinked (despite it never having been created) after the source could not be opened.

    As the original poster correctly noted, the problem is:

    open("/mnt­/xfer/3165­0_WRETCHED Radio 04-25_DIST­.wav", O_RDONLY|O­_L +ARGEFILE­) = -1 ETXTBSY (Text file busy

    So the addition of $! will likely add "Text file busy" to the output.

    I've only ever seen that error when trying to delete an executable image when it was in use. On a good Unix platform, "man 2 open" will actually note more details about what "Text file busy" really means in the case of open() failing.

    Indeed, open(2) says:

    ETXTBSY
    The file is a pure procedure (shared text) file that is being executed and the open() system call requests write access.

    which doesn't match the name *.wav nor O_RDONLY so the original poster might want to check "man 2 open" on his particular operating system (and hope that it has "good" man pages in this respect).

    - tye        

      Let me add a little more information. This script transfers several dozen files per day. The files are either plain text, mp3 or wav audio files. It may run flawlessly for several days, then it might crash 5 or 6 times in one day.

      If it crashes during the transfer attempt of a file, when restarted, it transfers the file without problem.

      Hence, I believe the problem is somehow timing related rather than the actual command structure. If the problem was really related to "Invalid cross-device link" errors or the like, would that not then cause the script to crash every time? I would think that type of error is more a result of an underlying problem.



      BTW, this is my first post on Perl Monks. I am truly impressed by the speed and quality of responses. Thank you all very much. I am anticipating a solution soon, thanks to your collective knowledge and kindness.



      Michael

        How do you know that the file has finished being written before you start copying it? In such situations, I usually recommend that you have a standardized naming convention for "still writing to the file" like "*.tmp" and then have whatever process is writing the file rename it to remove the ".tmp" after it is finished writing it.

        If after you have (already?) solved the above concern, the "text file busy" problem (besides causing me to research what that error really means which might even lead me to reading OS source code) would lead me to detecting move() failing and leaving $! set to ETXTBUSY (from Errno) and then just sleeping and retrying a couple of times.

        - tye        

        Hence, I believe the problem is somehow timing related rather than the actual command structure. If the problem was really related to "Invalid cross-device link" errors or the like, would that not then cause the script to crash every time

        Not all. After all, "Text file busy" means *something else* (another handle? another process?) is keeping the file "busy" (mmapped?).

Re: Random script crashes related to File::Copy
by roboticus (Chancellor) on Apr 29, 2009 at 15:50 UTC
    michael.barnes:

    It appears that the "Invalid cross-device link" error might be your problem. It appears that you're trying to move the file to a different device, and the move fails. Then it seems like the move routine is attempting to delete the filename you're trying to move it *to*. You might want to post the source of the move subroutine.

    UPDATE: It's clear that I was out in left field. Thanks to runrig and tye for clarifying and correcting things.

    ...roboticus
Re: Random script crashes related to File::Copy
by runrig (Abbot) on Apr 29, 2009 at 15:54 UTC
    You should add $! to the error message (since "Holy moly" doesn't really tell you what went wrong). But it looks like maybe move() is trying to delete the file while it is still being written to open the file while it is being executed (updated due to tye).
Re: Random script crashes related to File::Copy
by MidLifeXis (Monsignor) on Apr 29, 2009 at 16:58 UTC

    Perhaps a solution might be in the case of a text file busy error (see $!), instead of dying right away, sleep and retry, or skip and wait for the next iteration of your move script. If the file ages too much, then emit a warning or error of some sort.

    If it is truly just a temporary error, then retrying at a later time might be fully appropriate.

    --MidLifeXis

    The tomes, scrolls etc are dusty because they reside in a dusty old house, not because they're unused. --hangon in this post

Re: Random script crashes related to File::Copy
by mikeraz (Friar) on Apr 29, 2009 at 17:03 UTC

    What does your script do to ensure the file being put into the folder is completely copied there?

    Possibly, you are trying to move the file out before the creating process finishes writing the file. Not that the error you are getting, ETXTBSY, hint that this is the case. Neither does ETXBSY hint at a reasonable cause based on your description.

    Perhaps moving the file out before it has been fully written is part of your current problem. Even if not, your description hints that this may be an issue. See Check whether a file is open for a discussion of the problem.


    Be Appropriate && Follow Your Curiosity

      Since I am using the move() function from the File::Copy module, I have been relying on the correctness of the module to check the various things mentioned.

      In comparing the two strace excerpts, it appears to me that the error is occurring before the move process actually begins. In the strace segment that does work, I see that if the open attempt is successful, then the read/write sequences begin.

      I am still quite new to this level of script work. But if I am understanding the strace output correctly, this error is occurring before anything actually touches the file itself. It seems that something is opening the file and not getting it closed in time for the next open attempt. I just don't know what.

      May this involve delving into the File::Copy module and possible needing some kind of change or delay there? I have looked at the module, but the majority of it is way over my head.


      Thanks,
      Michael
        I have been relying on the correctness of the module to check the various things mentioned.

        Um, File::Copy::move()'s documentation says nothing about "Waits for the source file to be finished being created and then...".

        Trying to get the monitor/copier process to correctly determine whether the to-be-moved file is finished being written is likely a hard problem, perhaps an impossible problem (depending).

        That is why I outlined the "usual" way to have whatever is dumping these files into the source directory to do it in a way that makes it clear when the file being dumped there has been completely dumped (and is thus ready to be moved by the monitor/copier).

        - tye        

Re: Random script crashes related to File::Copy
by tokpela (Chaplain) on Apr 30, 2009 at 08:41 UTC
    As others have mentioned above, this may be because the file has not yet finished writing. I would use a "stable" file concept and check that the file has not been modified since X time period.
    # check if the file has stablized for over 30 seconds if (is_stable_file($file, 30)) { process($file); } sub is_stable_file { my $file = shift; my $wait_seconds = shift || 0; my $result = 1; if ($wait_seconds > 0) { my ($mtime) = (stat($file))[9]; my $secs_since_mod = (time - $mtime); if ($secs_since_mod < $wait_seconds) { $result = 0; } } return $result; }