Re: Threads slurping a directory and processing before conclusion

 > 3. previous attempts have hit major stability and
 > time snags, even at the prototyping stage due to the
 > sheer volume of files that make up a comprehensive sample

I notice (based on the "F:/" pathname) that you're on Win32.

You have a File::Find::find like recursive file processing part in your code. This is always going to be slower than necesary on Win32 when coded in Perl.

Consider using/writing some C/XS that generates the file list and avoids all the unnecesary stat (-d !) calls by using FindNextFile().

Also consider using forks over threads. They're easier on Win32 than you might think.

Take a look at qfind.c and peg in my CPAN directory for ideas:

http://cpan.mirrors.uk2.net/authors/id/A/AD/ADAVIES/

Try comparing the time taken for qfind to generate a file list compared to a pure Perl solution eg.


c:\> perl -e "${^WIN32_SLOPPY_STAT}=1; use Time::HiRes; $start = Time::HiRes::time; open Q, 'qfind.exe |'; while (<Q>) {}; close Q; print 'Took ', (Time::HiRes::time - $start)"

c:\> perl -e "${^WIN32_SLOPPY_STAT}=1; use Time::HiRes; use File::Find; $start = Time::HiRes::time; File::Find::find(sub { }, '.'); print 'Took ', (Time::HiRes::time - $start)"

On my Perl source directory of ~10_000 files this is <0.3 sec vs 1.7 sec. I suspect on your 1.2 million files this gives a *considerable* speed up.

Oh, and make sure you BEGIN { ${^WIN32_SLOPPY_STAT} = 1 }; at the top of your code!

Good luck.

Comment on Re: Threads slurping a directory and processing before conclusion Download Code

Replies are listed 'Best First'.
Re^2: Threads slurping a directory and processing before conclusion by BrowserUk (Patriarch) on Aug 22, 2011 at 16:58 UTC
qfind is interesting and quite fast, but given that the OP is talking about slurping the contents of millions of image files, the time taken to produce the list of those files is likely to be completely insignificant. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]

Replies are listed 'Best First'.

Re^2: Threads slurping a directory and processing before conclusion
by BrowserUk (Patriarch) on Aug 22, 2011 at 16:58 UTC

qfind is interesting and quite fast, but given that the OP is talking about slurping the contents of millions of image files, the time taken to produce the list of those files is likely to be completely insignificant.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

[reply]