When is it safe to move a file?

BoredByPolitics has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: When is it safe to move a file? by kschwab (Vicar) on Jan 14, 2001 at 20:12 UTC
I would recommend renaming both files before you start. If the ftp server is actively writing data to the files, they will continue writing to the files. If the ftp client starts sending a new file after the rename(), it won't overwrite the renamed files. You can then loop for some reasonable amount of time, and if the mtime doesn't change, you can guess that the ftp server isn't writing to them anymore. Something like: `if ( -f $control_file && -f $data_file ) { rename($data_file,$data_file. $$); rename($control_file,$control_file . $$); } else { exit; } while (1) { $mtime=(stat($data_file . $$))[9]; # drop out of the loop if the file hasn't # changed in 5 minutes (perhaps longer # for a wan connection ?) last if (time > $mtime + 300); sleep(30); } #process the files` [download] This still leaves a small race condition between the two rename(s) in which the control file might have been overwritten. (a real small timeslice, but a timeslice nonetheless...) You mentioned you didn't have control over the process, but if you ever do get control, here's two ideas: Have the ftp client send the files as file.tmp, then use the "rename file.tmp file" ftp command when the file is sent. Lacking that, install an ftp server that does something similar. proftpd has a configuration parameter called HiddenStor that does this. Update: yep..I missed a sleep in the lower loop, which would burn lots of resources...fixed	[reply] [d/l]
Re: When is it safe to move a file? by Dominus (Parson) on Jan 14, 2001 at 22:41 UTC
Says kschwab: Something like: `while (1) { $mtime=(stat($data_file . $$))[9]; # drop out of the loop if the file hasn't # changed in 5 minutes (perhaps longer # for a wan connection ?) last if (time > $mtime + 300); }` [download] That is a busy-wait loop. Even if the loop runs only once, you have called `stat` one million times and run up the load on the system for five minutes. Try it like this: `my $old_mtime = (stat($file))[9]; if (time <= $old_mtime + 300) { while (1) { sleep 300; my $new_mtime=(stat($file))[9]; last if $old_mtime == $new_mtime; $old_mtime = $new_mtime; } }` [download]	[reply] [d/l] [select]
Re: Re: When is it safe to move a file? by kschwab (Vicar) on Jan 15, 2001 at 03:27 UTC
Right, I should have slept() in the loop. I'm not sure how one loop runs stat() a million times though :) Fixed my code and re-posted. Thanks for the catch.	[reply]
Re: When is it safe to move a file? by Dominus (Parson) on Jan 15, 2001 at 04:01 UTC
Re: Re: When is it safe to move a file? by kschwab (Vicar) on Jan 15, 2001 at 04:20 UTC
Some notes below your chosen depth have not been shown here
Re: When is it safe to move a file? by repson (Chaplain) on Jan 14, 2001 at 17:49 UTC
Hmmm, my thoughts. Use `stat` to compare the inode access/modify/change times for the data and control files. Also check that the control file is big enough to be finished. If all the inode times are near equal, and the files are resonable sizes, you may be able to assume the file is okay. `my $name = 'foo.txt'; my $cnrl_name = $name . '.ctrl'; if ( ((stat($cnrl_name))[9] > (stat($name))[9]) and ((-s($cnrl_name))> +10) ) { copy $name, "dir/$name"; unlink $name, $cnrl_name; }` [download] It would be better if filenames were always unique or the ftp server used flock or lockf on the files, but it may be possible to deal with your problem. I have no idea if this solution will work, I really don't know all that much about inodes and stuff.	[reply] [d/l]
Re: Re: When is it safe to move a file? by BoredByPolitics (Scribe) on Jan 14, 2001 at 19:00 UTC
Thanks repson - I'll give that a go and see how I get on :-) Pete	[reply]
Re: When is it safe to move a file? by Dominus (Parson) on Jan 14, 2001 at 22:23 UTC
I'd strongly recomment that you follow kschwab's advice and rename the data file to the same directory before you start to copy or process it. Copying takes time, and you don't want the remote process to start updating the file while your process in in the middle of copying it. Renaming it first will help prevent that. Then, after you rename it, check to see if it's still changing, because it's possible that the remote process opened the file just before you renamed it. There's still a race condition, but it's much smaller. Proper design would have been for the remote program to never send the same data twice and never use the same filename twice. I also recommend that you find the person who designed this broken protocol and kick them in the ass.	[reply]
Re: When is it safe to move a file? by Fastolfe (Vicar) on Jan 14, 2001 at 20:49 UTC
It's usually standard operating procedure for files like this to be written to a temporary filename, and then renamed to their final name only after the transfer is completed. This essentially guarantees that the file that's there will always be in its final complete form. If your script opens this file, begins reading from it, and the FTP session suddenly comes in and starts saving its own file, these won't clash. In the event it's finished before you are, by renameing it, it simply unlinks your file and replaces the filename with the new one. Since you still have an open file handle, your data isn't affected. I might note in the control file that this update has been processed prior to you opening the file, just so that in a case like this you can pick up the new change next time. An alternative might be to use lock files. Have the FTP server send a lock file before it starts uploading, remove it when it's done, and have your script honor it. If you have a great deal of control over the FTP process on the other end, such a lock file could work both ways.	[reply]
Re: When is it safe to move a file? by BoredByPolitics (Scribe) on Jan 14, 2001 at 23:34 UTC
Thanks guys - with the advice that's been offered, I can now implement a solution with is as safe as it can be, considering the interface. As regards the design of the interface - I had a little input into it, however, the developer of the software which runs on the remote machine had 'done this sort of thing before', and also claimed that no crc/md5 checking needed to be done because 'in [his] experience, ftp is a totally safe transport medium' - ah well, perhaps we can fix the interface with v2 ... Pete	[reply]
Re: Re: When is it safe to move a file? by Dragonfly (Priest) on Jan 15, 2001 at 00:13 UTC
'in his experience, ftp is a totally safe transport medium' Which, of course, explains the existence of SSH and SCP. For all the recent concern I've seen around the Monastery regarding security, I'm suprised nobody has pointed out the inherent flaws in the concept of using FTP to transfer your files, your username, and your password in plain text across the internet. SCP, an encrypted drop-in replacement for FTP, is a great alternative, and there is even a Net::SCP module that should allow you to keep using your current scripts, even if they currently rely on Net::FTP. And for a secure (encrypted) remote shell prompt in Perl, you can't beat the Net::SSH module. I realize it's probably too late (or perhaps completely unfeasible ;-) to switch to these in the middle of your project, but maybe you or someone else who reads this can use these in future projects. Take a look at the secure alternatives to FTP and Telnet; now that the US encryption export laws have changed (and the RSA patents have expired) you can use them for free on almost any OS.	[reply]
Re: Re: Re: When is it safe to move a file? by BoredByPolitics (Scribe) on Jan 15, 2001 at 01:17 UTC
By 'safe', he meant that the file would arrive at the destination without corruptions. Currently, because all the network connections are on a private WAN, security is considered an obstruction! This attitude drives me nuts, but my recommendations fall on deaf ears. However, I get the impression that someone higher up is organizing security audits (not just of the computer networks), so hopefully this attitude will fade away once the right incentives are in place (such as dismissals, etc ;-) Pete	[reply]


Problems? Is your data what you think it is?
	PerlMonks