BoredByPolitics has asked for the wisdom of the Perl Monks concerning the following question:
I'm writing a program which has to work with what I now believe is a broken interface. It's purpose is to process data files after they've been ftp'd into a specific directory - my program will probably be invoked via cron, so sometimes there won't be any data to process - no problem.
After the data file is written via ftp, a control file is also written via ftp - it's the existance of this control file that tells my program that it's safe to proceed.
Now, this is where the fun starts - if my program hasn't processed the data file before the machine which originally sent it has more data to send, the other machine sends the original data, plus the new data, as one data file, overwriting the original data file, then updates the control file accordingly.
I move the data file to a work directory before processing it. If I use the File::Copy::move subroutine, what will happen if I try and move the file while it's being overwritten by the other machine?
I can't now change the way the remote machine behaves, and I don't know in advance what the ftp server is going to be on the box my program is to run on (although I do know that the OS is Solaris).
Does anyone have any advice on how I can make my program work safely?
Thanks.
Pete
Re: When is it safe to move a file?
by kschwab (Vicar) on Jan 14, 2001 at 20:12 UTC
|
I would recommend renaming both files
before you start. If the ftp server is actively
writing data to the files, they will continue
writing to the files. If the ftp client
starts sending a new file after the rename(),
it won't overwrite the renamed files.
You can then loop for some reasonable amount
of time, and if the mtime doesn't change, you
can guess that the ftp server isn't writing
to them anymore.
Something like:
if ( -f $control_file && -f $data_file ) {
rename($data_file,$data_file. $$);
rename($control_file,$control_file . $$);
} else {
exit;
}
while (1) {
$mtime=(stat($data_file . $$))[9];
# drop out of the loop if the file hasn't
# changed in 5 minutes (perhaps longer
# for a wan connection ?)
last if (time > $mtime + 300);
sleep(30);
}
#process the files
This still leaves a small race condition between
the two rename(s) in which the control file might
have been overwritten. (a real small timeslice, but
a timeslice nonetheless...)
You mentioned you didn't have control over the process,
but if you ever do get control, here's two ideas:
- Have the ftp client send the files as file.tmp, then
use the "rename file.tmp file" ftp command when the file is
sent.
- Lacking that, install an ftp server that does something
similar. proftpd has
a configuration parameter called HiddenStor that does this.
Update: yep..I missed a sleep in the lower loop,
which would burn lots of resources...fixed | [reply] [d/l] |
|
my $old_mtime = (stat($file))[9];
if (time <= $old_mtime + 300) {
while (1) {
sleep 300;
my $new_mtime=(stat($file))[9];
last if $old_mtime == $new_mtime;
$old_mtime = $new_mtime;
}
}
| [reply] [d/l] [select] |
|
| [reply] |
|
|
|
Re: When is it safe to move a file?
by repson (Chaplain) on Jan 14, 2001 at 17:49 UTC
|
Hmmm, my thoughts.
Use stat to compare the inode access/modify/change
times for the data and control files. Also check that the
control file is big enough to be finished. If all the inode
times are near equal, and the files are resonable sizes, you
may be able to assume the file is okay.
my $name = 'foo.txt';
my $cnrl_name = $name . '.ctrl';
if ( ((stat($cnrl_name))[9] > (stat($name))[9]) and ((-s($cnrl_name))>
+10) )
{
copy $name, "dir/$name";
unlink $name, $cnrl_name;
}
It would be better if filenames were always unique or the ftp server used flock or lockf on the files, but it may be possible to deal with your problem. I have no idea if this solution will work, I really don't know all that much about inodes and stuff. | [reply] [d/l] |
|
| [reply] |
|
I'd strongly recomment that you follow
kschwab's advice and rename the
data file to the same directory before you start to copy
or process it. Copying takes time, and you don't want
the remote process to start updating the file
while your process in in the middle of copying it.
Renaming it first will help prevent that.
Then, after you rename it, check to see if
it's still changing, because it's possible that
the remote process opened the file just before you renamed it.
There's still a race condition, but it's much smaller.
Proper design would have been for the remote program to never send
the same data twice and never use the same filename twice.
I also recommend that you find the person who designed
this broken protocol and kick them in the ass.
| [reply] |
Re: When is it safe to move a file?
by Fastolfe (Vicar) on Jan 14, 2001 at 20:49 UTC
|
It's usually standard operating procedure for files like this to be written to a temporary filename, and then renamed to their final name only after the transfer is completed. This essentially guarantees that the file that's there will always be in its final complete form. If your script opens this file, begins reading from it, and the FTP session suddenly comes in and starts saving its own file, these won't clash. In the event it's finished before you are, by renameing it, it simply unlinks your file and replaces the filename with the new one. Since you still have an open file handle, your data isn't affected. I might note in the control file that this update has been processed prior to you opening the file, just so that in a case like this you can pick up the new change next time.
An alternative might be to use lock files. Have the FTP server send a lock file before it starts uploading, remove it when it's done, and have your script honor it. If you have a great deal of control over the FTP process on the other end, such a lock file could work both ways. | [reply] |
Re: When is it safe to move a file?
by BoredByPolitics (Scribe) on Jan 14, 2001 at 23:34 UTC
|
Thanks guys - with the advice that's been offered, I can now implement a solution with is as safe as it can be, considering the interface.
As regards the design of the interface - I had a little input into it, however, the developer of the software which runs on the remote machine had 'done this sort of thing before', and also claimed that no crc/md5 checking needed to be done because 'in [his] experience, ftp is a totally safe transport medium' - ah well, perhaps we can fix the interface with v2 ...
Pete
| [reply] |
|
'in his experience, ftp is a totally safe transport medium'
Which, of course, explains the existence of SSH and SCP.
For all the recent concern I've seen around the Monastery regarding security, I'm suprised nobody has pointed out the inherent flaws in the concept of using FTP to transfer your files, your username, and your password in plain text across the internet.
SCP, an encrypted drop-in replacement for FTP, is a great alternative, and there is even a Net::SCP module that should allow you to keep using your current scripts, even if they currently rely on Net::FTP.
And for a secure (encrypted) remote shell prompt in Perl, you can't beat the Net::SSH module.
I realize it's probably too late (or perhaps completely unfeasible ;-) to switch to these in the middle of your project, but maybe you or someone else who reads this can use these in future projects. Take a look at the secure alternatives to FTP and Telnet; now that the US encryption export laws have changed (and the RSA patents have expired) you can use them for free on almost any OS.
| [reply] |
|
By 'safe', he meant that the file would arrive at the destination without corruptions.
Currently, because all the network connections are on a private WAN, security is considered an obstruction! This attitude drives me nuts, but my recommendations fall on deaf ears.
However, I get the impression that someone higher up is organizing security audits (not just of the computer networks), so hopefully this attitude will fade away once the right incentives are in place (such as dismissals, etc ;-)
Pete
| [reply] |
|
|