in reply to Threads slurping a directory and processing before conclusion
> 3. previous attempts have hit major stability and > time snags, even at the prototyping stage due to the > sheer volume of files that make up a comprehensive sample
I notice (based on the "F:/" pathname) that you're on Win32.
You have a File::Find::find like recursive file processing part in your code. This is always going to be slower than necesary on Win32 when coded in Perl.
Consider using/writing some C/XS that generates the file list and avoids all the unnecesary stat (-d !) calls by using FindNextFile().
Also consider using forks over threads. They're easier on Win32 than you might think.
Take a look at qfind.c and peg in my CPAN directory for ideas:
http://cpan.mirrors.uk2.net/authors/id/A/AD/ADAVIES/Try comparing the time taken for qfind to generate a file list compared to a pure Perl solution eg.
c:\> perl -e "${^WIN32_SLOPPY_STAT}=1; use Time::HiRes; $start = Time::HiRes::time; open Q, 'qfind.exe |'; while (<Q>) {}; close Q; print 'Took ', (Time::HiRes::time - $start)"
c:\> perl -e "${^WIN32_SLOPPY_STAT}=1; use Time::HiRes; use File::Find; $start = Time::HiRes::time; File::Find::find(sub { }, '.'); print 'Took ', (Time::HiRes::time - $start)"
On my Perl source directory of ~10_000 files this is <0.3 sec vs 1.7 sec. I suspect on your 1.2 million files this gives a *considerable* speed up.
Oh, and make sure you BEGIN { ${^WIN32_SLOPPY_STAT} = 1 }; at the top of your code!
Good luck.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Threads slurping a directory and processing before conclusion
by BrowserUk (Patriarch) on Aug 22, 2011 at 16:58 UTC |