Re: FTP and File Copying
by merlyn (Sage) on Nov 27, 2000 at 05:54 UTC
|
Always upload to a "temp" file name, then rename to a "good" file name (via FTP rename operations) when the upload is finished. You'll never get a partial file that way.
-- Randal L. Schwartz, Perl hacker | [reply] |
|
|
That is not one of my options. The files need to be taken as is once they are uploaded. I don't have the clout nor do the submitters have the knowledge to go in and rename files via FTP. Most of the people uploading these files are lucky to be able to FTP them. I cannot enforce new to old naming conventions and expect them to do this, I am lucky I get them to stick to the naming conventions at all. So the issue is I need to be able to tell if a file is currently open and being written to by any other process. I haven't been able to find anything as of yet to do this. I have though of work arounds such as writting filenames and time stamps to a DB and looking to the xferlog for new incoming files but I would much rather do this on the level of the files itself rather then bringing in extraneous resources. I've also tried flock but I can get a full lock even if the file is being written to by another source (such a file becoming a tar can flock'd mid tar) which I don't want if the file is still being written to.
Any process I have control of do go temp to final filenames when using FTP where I can.
| [reply] |
|
|
If you are running under a Unix OS, you can use a utility called "lsof" to determine what processes currently have the file open, and the mode used, etc. My lsof binary points to the URL ftp://vic.cc.purdue.edu/pub/tools/unix/lsof, if your system doesn't have it. The exit status of this program is 0 if any programs have the file open.
I really think this is kind of a crappy solution, though, because other programs may have that file open for other reasons that might not be easy to differentiate, and (as mentioned elsewhere), it doesn't help you if the FTP process legitimately closes the file in an incomplete form.
| [reply] |
|
|
|
|
| [reply] |
|
|
|
|
|
|
If you really do have to get them as soon as they are
finished, this sounds like a project for a secure web
server. Of course, that won't work for dialin folks, but
could be a good solution.
| [reply] |
Re: FTP and File Copying
by lhoward (Vicar) on Nov 27, 2000 at 18:37 UTC
|
Just to throw another log on the fire...Another possibility might bet watch the FTP server's log and copy files when the ftp server logs them as complete. That way if tail the ftp server's log in real-tie you could copy the files to their final location immediately instead of sheduling the replication every half hour like you are doing now. | [reply] |
Re: FTP and File Copying
by a (Friar) on Nov 27, 2000 at 09:15 UTC
|
Do you have to process the file right away? Can you do
Merlyn's original suggestion: copy the file somewhere temporary
and come back to it. If it matches the incoming file, its
done, handle and delete. If not, try again. Puts you off one
'check for incoming' loop at first but you're already waiting
a half hour, so its not 'real time' critical.
We use a checkfile process, if the file's the same size/un-touched
for some # of minutes, it's considered done. Guess part of this may
be what the fallout is of handling half done files vs. the
need for speed.
a | [reply] |
Re: FTP and File Copying
by Albannach (Monsignor) on Nov 27, 2000 at 10:32 UTC
|
Since there doesn't appear to be a good solution to this one yet,
how about a bad one ;-)
Your script could record the sizes (etc.) of all the files in the upload
directory some time before the half hour mark, then go to sleep. When the
time comes for doing the copying, the script only works on the
files whose sizes are unchanged from the previous snapshot. I'm
guessing that 5 minutes ought to do, but if your senders are as
iffy as you say, maybe you want to take several snapshots.
One other idea that you probably considered and discarded, but placed here just in case:
If there is any consistency to the file structure, can you detect whether they are
complete by the content in any way? I'd like to suggest you get the submitters to put a
standard end line of some kind but it sounds like that won't fly.
| [reply] |
|
|
On your last point md5 sums seem like the right idea.
I.e., make a rule that says you must upload an md5
signature along with any file -- that way you know
when a file is done -- it matches its signature.
Obviously this is a problem with less-than-skilled
users.
| [reply] |
Re: FTP and File Copying
by 2501 (Pilgrim) on Nov 27, 2000 at 07:00 UTC
|
I guess depending on the reason why you are copying the file, could you instead create links to the files and put the links in the target dir? That would prevent partial copying, but it could be a problem if you are trying to prevent corruption of original uploads.
| [reply] |