Fendaria has asked for the wisdom of the Perl Monks concerning the following question:
We have a file based process that operates on XML files on windows (xml file in -> manipulated -> xml file out). (Performance is fine).
Windows box, 2000 Server (or 2003, forget which). Cygwin perl, not activestate (cause of libs).
The files, after they are written, are then processed by a perl script. It runs via a cron job every 2 minutes, scans for new files, renames the new files (adds a timestamp), archives them, and pushes them to a remote machine (tar/scp).
The problem comes that due to some external factors, the cron job doesn't always run every two minutes so occasionally the files backup in the directory waiting to be processed. Right now we have 40,000+ files sitting in that directory. Unfortunately the number of files in the directory has ground the system to a halt. The XML files individually are small, under 1k each, there just happen to be a lot of them.
The perl script is using File::Copy to copy/rename files, unlink to delete them, and tar/scp to move the files around. When the directory is small (under 1000 files) performance is great. When the directory is large, renames can take over 1sec per file, copies can take over 1sec per file, and unlinks are down to under 10 per second. (I can't unlink multiple files at once because I need to know which file fails).
We are working on a variety of solutions, but my biggest concern is the processing time of File::Copy and unlink in perl.
So the big question is, is there any faster way to do this work? I've looked around and can't anything which looks like it might be faster than File::Copy (and unlink).
One item I am considering strongly is attemping to process the files in more of a 'batch' mode. I could shell out and do a system call to 'rm' but then I would have the overhead of the system call and need to test each individual file to make sure it really got deleted. I also no of no real way to batch a call to File::Copy (or the system copy) to handle multiples at the same time (and tell me which of the batch failed for logging/error handling).
Any thoughts or ideas that I should pursue or investigate are greatly appreciated. As well, if this is an OS performance problem that can't really be solved, I'd appreciate knowing that too.
Note: the identical perl script currently runs under linux as well. While linux doesn't suffer from this problem as badly, it does slow down when large numbers of files are in the same directory too though not as significantly. Thus cross platform ideas would help too. However, if I have to put OS specific code into my script for performace gains I am willing to do that. And if I have to install ActiveState for a solution, I'm willing to do that but I would rather not.
Note: I could throw better hardware at this problem, but I really don't want to have to do this. I feel like there ought to be a better way to go about it via software (hopefully in Perl).
Appologies if this post isn't too clear, I'm busy trying to get this working better and don't have days to write up as clean a description as I really wish to be able to do.
Thanks,
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: File::Copy and file manipulation performance
by Fletch (Bishop) on Dec 06, 2005 at 21:03 UTC | |
|
Re: File::Copy and file manipulation performance
by psychotic (Beadle) on Dec 06, 2005 at 22:15 UTC | |
by Fendaria (Beadle) on Dec 06, 2005 at 22:31 UTC | |
by psychotic (Beadle) on Dec 06, 2005 at 22:45 UTC | |
by Fendaria (Beadle) on Dec 06, 2005 at 23:16 UTC | |
|
Re: File::Copy and file manipulation performance
by tirwhan (Abbot) on Dec 06, 2005 at 21:52 UTC | |
|
Re: File::Copy and file manipulation performance
by Perl Mouse (Chaplain) on Dec 07, 2005 at 00:17 UTC | |
|
Re: File::Copy and file manipulation performance
by diotalevi (Canon) on Dec 06, 2005 at 22:39 UTC | |
by tirwhan (Abbot) on Dec 06, 2005 at 23:11 UTC |