We have a file based process that operates on XML files on windows (xml file in -> manipulated -> xml file out). (Performance is fine).

Windows box, 2000 Server (or 2003, forget which). Cygwin perl, not activestate (cause of libs).

The files, after they are written, are then processed by a perl script. It runs via a cron job every 2 minutes, scans for new files, renames the new files (adds a timestamp), archives them, and pushes them to a remote machine (tar/scp).

The problem comes that due to some external factors, the cron job doesn't always run every two minutes so occasionally the files backup in the directory waiting to be processed. Right now we have 40,000+ files sitting in that directory. Unfortunately the number of files in the directory has ground the system to a halt. The XML files individually are small, under 1k each, there just happen to be a lot of them.

The perl script is using File::Copy to copy/rename files, unlink to delete them, and tar/scp to move the files around. When the directory is small (under 1000 files) performance is great. When the directory is large, renames can take over 1sec per file, copies can take over 1sec per file, and unlinks are down to under 10 per second. (I can't unlink multiple files at once because I need to know which file fails).

We are working on a variety of solutions, but my biggest concern is the processing time of File::Copy and unlink in perl.

So the big question is, is there any faster way to do this work? I've looked around and can't anything which looks like it might be faster than File::Copy (and unlink).

One item I am considering strongly is attemping to process the files in more of a 'batch' mode. I could shell out and do a system call to 'rm' but then I would have the overhead of the system call and need to test each individual file to make sure it really got deleted. I also no of no real way to batch a call to File::Copy (or the system copy) to handle multiples at the same time (and tell me which of the batch failed for logging/error handling).

Any thoughts or ideas that I should pursue or investigate are greatly appreciated. As well, if this is an OS performance problem that can't really be solved, I'd appreciate knowing that too.

Note: the identical perl script currently runs under linux as well. While linux doesn't suffer from this problem as badly, it does slow down when large numbers of files are in the same directory too though not as significantly. Thus cross platform ideas would help too. However, if I have to put OS specific code into my script for performace gains I am willing to do that. And if I have to install ActiveState for a solution, I'm willing to do that but I would rather not.

Note: I could throw better hardware at this problem, but I really don't want to have to do this. I feel like there ought to be a better way to go about it via software (hopefully in Perl).

Appologies if this post isn't too clear, I'm busy trying to get this working better and don't have days to write up as clean a description as I really wish to be able to do.

Thanks,

Fendaria

In reply to File::Copy and file manipulation performance by Fendaria

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.