The fastest way to use File::Find is to turn off the 'count nlinks' "optimization" and then avoid using things like "-s $_", using "-s _" instead (note that "-f" is just short for "-f $_" and so should be "-f _" instead).

Note that treatment of "-l _" can sometimes be a problem with this scheme. Unfortunately, Perl doesn't allow you to cache more than one stat/lstat result so even if File::Find does both lstat and stat, you'll only have access to the last one done.

The code for File::Find has become quite convoluted and I'm not going to spend hours trying to track what it is doing. But, based on what needs to be done (and what I do when I roll my own replacement for File::Find, which I often find easier than trying to figure out subtle vagarities of File::Find), if you set $File::Find::dont_use_nlink= 1 and don't ask File::Find to follow symbolic links, then File::Find will have to lstat every file and doesn't need to stat any files so your "wanted" sub should get called such that "-l _" tells you whether or not the found item is a symbolic link (and you can't tell anything about what the symbolic link points to without issuing you own stat by not using the "_" stat cache). And this is usually exactly what you want.

So my suggestions for changes to your code are:

#... use File::Find; $File::Find::dont_use_nlink= 1; # Avoid slowing "optimization" #... my @dirs = qw( /nas/fs001 /nas/fs002 /nas/fs003 /nas/fs003 /nas/fs004 /nas/fs005 /nas/fs006 /nas/fs007 /nas/fs008 /nas/fs009 /nas/fs010 /nas/fs011 /nas/fs012 /nas/fs013 /nas/fs014 ); #... my $dir = pop @_; # 1 arg only, because that lets me thread. #... if ( -f _ && ! -l _ ) { my $filesize = -s _; #...

Note that I replace pop with pop @_ as making the use of @_ implicit is against my best practices because I've seen code where this practice has made it difficult to figure out how the subroutine arguments are being used (it also prevents bareword problems and eliminates the risk of confusion with an implicit @ARGV).

I suspect you can drop the && ! -l _ from your code, since you'll have the cached results from lstat so -f _ being true will mean that the found item isn't a symbolic link. But leaving it in doesn't hurt either.

- tye        


In reply to Re: File::Find in a thread safe fashion (speed) by tye
in thread File::Find in a thread safe fashion by Preceptor

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.