Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

When is it safe to move a file?

by BoredByPolitics (Scribe)
on Jan 14, 2001 at 17:34 UTC ( [id://51733]=perlquestion: print w/replies, xml ) Need Help??

BoredByPolitics has asked for the wisdom of the Perl Monks concerning the following question:

I'm writing a program which has to work with what I now believe is a broken interface. It's purpose is to process data files after they've been ftp'd into a specific directory - my program will probably be invoked via cron, so sometimes there won't be any data to process - no problem.

After the data file is written via ftp, a control file is also written via ftp - it's the existance of this control file that tells my program that it's safe to proceed.

Now, this is where the fun starts - if my program hasn't processed the data file before the machine which originally sent it has more data to send, the other machine sends the original data, plus the new data, as one data file, overwriting the original data file, then updates the control file accordingly.

I move the data file to a work directory before processing it. If I use the File::Copy::move subroutine, what will happen if I try and move the file while it's being overwritten by the other machine?

I can't now change the way the remote machine behaves, and I don't know in advance what the ftp server is going to be on the box my program is to run on (although I do know that the OS is Solaris).

Does anyone have any advice on how I can make my program work safely?

Thanks.

Pete

Replies are listed 'Best First'.
Re: When is it safe to move a file?
by kschwab (Vicar) on Jan 14, 2001 at 20:12 UTC
    I would recommend renaming both files before you start. If the ftp server is actively writing data to the files, they will continue writing to the files. If the ftp client starts sending a new file after the rename(), it won't overwrite the renamed files.

    You can then loop for some reasonable amount of time, and if the mtime doesn't change, you can guess that the ftp server isn't writing to them anymore.

    Something like:

    if ( -f $control_file && -f $data_file ) { rename($data_file,$data_file. $$); rename($control_file,$control_file . $$); } else { exit; } while (1) { $mtime=(stat($data_file . $$))[9]; # drop out of the loop if the file hasn't # changed in 5 minutes (perhaps longer # for a wan connection ?) last if (time > $mtime + 300); sleep(30); } #process the files
    This still leaves a small race condition between the two rename(s) in which the control file might have been overwritten. (a real small timeslice, but a timeslice nonetheless...)

    You mentioned you didn't have control over the process, but if you ever do get control, here's two ideas:

    • Have the ftp client send the files as file.tmp, then use the "rename file.tmp file" ftp command when the file is sent.
    • Lacking that, install an ftp server that does something similar. proftpd has a configuration parameter called HiddenStor that does this.

    Update: yep..I missed a sleep in the lower loop, which would burn lots of resources...fixed

      Says kschwab:
      Something like:

      while (1) { $mtime=(stat($data_file . $$))[9]; # drop out of the loop if the file hasn't # changed in 5 minutes (perhaps longer # for a wan connection ?) last if (time > $mtime + 300); }

      That is a busy-wait loop. Even if the loop runs only once, you have called stat one million times and run up the load on the system for five minutes.

      Try it like this:

      my $old_mtime = (stat($file))[9]; if (time <= $old_mtime + 300) { while (1) { sleep 300; my $new_mtime=(stat($file))[9]; last if $old_mtime == $new_mtime; $old_mtime = $new_mtime; } }
        Right, I should have slept() in the loop. I'm not sure how one loop runs stat() a million times though :)

        Fixed my code and re-posted. Thanks for the catch.

Re: When is it safe to move a file?
by repson (Chaplain) on Jan 14, 2001 at 17:49 UTC
    Hmmm, my thoughts.

    Use stat to compare the inode access/modify/change times for the data and control files. Also check that the control file is big enough to be finished. If all the inode times are near equal, and the files are resonable sizes, you may be able to assume the file is okay.

    my $name = 'foo.txt'; my $cnrl_name = $name . '.ctrl'; if ( ((stat($cnrl_name))[9] > (stat($name))[9]) and ((-s($cnrl_name))> +10) ) { copy $name, "dir/$name"; unlink $name, $cnrl_name; }
    It would be better if filenames were always unique or the ftp server used flock or lockf on the files, but it may be possible to deal with your problem. I have no idea if this solution will work, I really don't know all that much about inodes and stuff.

      Thanks repson - I'll give that a go and see how I get on :-)

      Pete

        I'd strongly recomment that you follow kschwab's advice and rename the data file to the same directory before you start to copy or process it. Copying takes time, and you don't want the remote process to start updating the file while your process in in the middle of copying it. Renaming it first will help prevent that.

        Then, after you rename it, check to see if it's still changing, because it's possible that the remote process opened the file just before you renamed it.

        There's still a race condition, but it's much smaller. Proper design would have been for the remote program to never send the same data twice and never use the same filename twice. I also recommend that you find the person who designed this broken protocol and kick them in the ass.

Re: When is it safe to move a file?
by Fastolfe (Vicar) on Jan 14, 2001 at 20:49 UTC
    It's usually standard operating procedure for files like this to be written to a temporary filename, and then renamed to their final name only after the transfer is completed. This essentially guarantees that the file that's there will always be in its final complete form. If your script opens this file, begins reading from it, and the FTP session suddenly comes in and starts saving its own file, these won't clash. In the event it's finished before you are, by renameing it, it simply unlinks your file and replaces the filename with the new one. Since you still have an open file handle, your data isn't affected. I might note in the control file that this update has been processed prior to you opening the file, just so that in a case like this you can pick up the new change next time.

    An alternative might be to use lock files. Have the FTP server send a lock file before it starts uploading, remove it when it's done, and have your script honor it. If you have a great deal of control over the FTP process on the other end, such a lock file could work both ways.

Re: When is it safe to move a file?
by BoredByPolitics (Scribe) on Jan 14, 2001 at 23:34 UTC

    Thanks guys - with the advice that's been offered, I can now implement a solution with is as safe as it can be, considering the interface.

    As regards the design of the interface - I had a little input into it, however, the developer of the software which runs on the remote machine had 'done this sort of thing before', and also claimed that no crc/md5 checking needed to be done because 'in [his] experience, ftp is a totally safe transport medium' - ah well, perhaps we can fix the interface with v2 ...

    Pete

      'in his experience, ftp is a totally safe transport medium'

      Which, of course, explains the existence of SSH and SCP.

      For all the recent concern I've seen around the Monastery regarding security, I'm suprised nobody has pointed out the inherent flaws in the concept of using FTP to transfer your files, your username, and your password in plain text across the internet.

      SCP, an encrypted drop-in replacement for FTP, is a great alternative, and there is even a Net::SCP module that should allow you to keep using your current scripts, even if they currently rely on Net::FTP.

      And for a secure (encrypted) remote shell prompt in Perl, you can't beat the Net::SSH module.

      I realize it's probably too late (or perhaps completely unfeasible ;-) to switch to these in the middle of your project, but maybe you or someone else who reads this can use these in future projects. Take a look at the secure alternatives to FTP and Telnet; now that the US encryption export laws have changed (and the RSA patents have expired) you can use them for free on almost any OS.

        By 'safe', he meant that the file would arrive at the destination without corruptions.

        Currently, because all the network connections are on a private WAN, security is considered an obstruction! This attitude drives me nuts, but my recommendations fall on deaf ears.

        However, I get the impression that someone higher up is organizing security audits (not just of the computer networks), so hopefully this attitude will fade away once the right incentives are in place (such as dismissals, etc ;-)

        Pete

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://51733]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (2)
As of 2024-04-25 20:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found