> 3. previous attempts have hit major stability and
 > time snags, even at the prototyping stage due to the
 > sheer volume of files that make up a comprehensive sample

I notice (based on the "F:/" pathname) that you're on Win32.

You have a File::Find::find like recursive file processing part in your code. This is always going to be slower than necesary on Win32 when coded in Perl.

Consider using/writing some C/XS that generates the file list and avoids all the unnecesary stat (-d !) calls by using FindNextFile().

Also consider using forks over threads. They're easier on Win32 than you might think.

Take a look at qfind.c and peg in my CPAN directory for ideas:

http://cpan.mirrors.uk2.net/authors/id/A/AD/ADAVIES/

Try comparing the time taken for qfind to generate a file list compared to a pure Perl solution eg.


c:\> perl -e "${^WIN32_SLOPPY_STAT}=1; use Time::HiRes; $start = Time::HiRes::time; open Q, 'qfind.exe |'; while (<Q>) {}; close Q; print 'Took ', (Time::HiRes::time - $start)"

c:\> perl -e "${^WIN32_SLOPPY_STAT}=1; use Time::HiRes; use File::Find; $start = Time::HiRes::time; File::Find::find(sub { }, '.'); print 'Took ', (Time::HiRes::time - $start)"

On my Perl source directory of ~10_000 files this is <0.3 sec vs 1.7 sec. I suspect on your 1.2 million files this gives a *considerable* speed up.

Oh, and make sure you  BEGIN { ${^WIN32_SLOPPY_STAT} = 1 }; at the top of your code!

Good luck.


In reply to Re: Threads slurping a directory and processing before conclusion by Clarendon4
in thread Threads slurping a directory and processing before conclusion by TRoderic

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.