I agree with the comments in the previous replies, and would add that there is also File::Finder, which provides something more like the command-line interface of the common unix "find" utility; like File::Find::Rule, this "amendment" to File::Find makes it a lot easier to come up with working code.

But both the ::Finder and ::Rule extensions are just wrappers around the core File::Find module, and all three end up suffering from the same problem relative to using the basic "find" utility -- they are much slower, and this is the main reason why I hate File::Find and anything based on it.

I'd much rather open a pipeline file handle running the "find" command: this utility is either native or freely available for all common OS's, it's pretty easy to use in a perl script via the file handle idiom, and it runs a lot faster -- typically a by factor of six in wallclock time.

I posted a benchmark on File::Find four years ago, and another on File::Finder two years ago, so here's a new one for File::Find::Rule (using an example from the module's man page). All of these benchmarks show pretty much the same timing difference between the module and the system "find" utility.

#!/usr/bin/perl use strict; use Benchmark; use File::Find::Rule; ( @ARGV == 1 and -d $ARGV[0] ) or die "Usage: $0 some/path\n"; print "started at ", scalar localtime, $/; timethese( 10, { 'Shell-find pipe' => \&try_pipe, 'file::Find::Rule' => \&try_ffr, }); sub try_ffr { my @f = File::Find::Rule->file()->name( '*.pm' )->in( $ARGV[0] ); print scalar @f, " .pm files found under $ARGV[0] at ", scalar loc +altime, $/; } sub try_pipe { open( FIND, "find $ARGV[0] -name '*.pm' |" ); my @f = <FIND>; print scalar @f, " .pm files found under $ARGV[0] at ", scalar loc +altime, $/; } __END__ # sample run: $ ffr-bm.pl /usr started at Sun Jul 16 23:48:33 2006 Benchmark: timing 10 iterations of Shell-find pipe, file::Find::Rule.. +. 481 .pm files found under /usr at Sun Jul 16 23:48:41 2006 481 .pm files found under /usr at Sun Jul 16 23:48:44 2006 481 .pm files found under /usr at Sun Jul 16 23:48:46 2006 481 .pm files found under /usr at Sun Jul 16 23:48:48 2006 481 .pm files found under /usr at Sun Jul 16 23:48:50 2006 481 .pm files found under /usr at Sun Jul 16 23:48:52 2006 481 .pm files found under /usr at Sun Jul 16 23:48:53 2006 481 .pm files found under /usr at Sun Jul 16 23:48:55 2006 481 .pm files found under /usr at Sun Jul 16 23:48:57 2006 481 .pm files found under /usr at Sun Jul 16 23:48:59 2006 Shell-find pipe: 26 wallclock secs ( 0.03 usr 0.04 sys + 8.73 cusr +5.79 csys = 14.59 CPU) @ 142.86/s (n=10) 481 .pm files found under /usr at Sun Jul 16 23:49:19 2006 481 .pm files found under /usr at Sun Jul 16 23:49:41 2006 481 .pm files found under /usr at Sun Jul 16 23:49:59 2006 481 .pm files found under /usr at Sun Jul 16 23:50:14 2006 481 .pm files found under /usr at Sun Jul 16 23:50:29 2006 481 .pm files found under /usr at Sun Jul 16 23:50:44 2006 481 .pm files found under /usr at Sun Jul 16 23:51:02 2006 481 .pm files found under /usr at Sun Jul 16 23:51:24 2006 481 .pm files found under /usr at Sun Jul 16 23:51:42 2006 481 .pm files found under /usr at Sun Jul 16 23:51:57 2006 file::Find::Rule: 178 wallclock secs (33.39 usr + 53.05 sys = 86.44 CP +U) @ 0.12/s (n=10)

The output shows that the OS's own caching behavior gives an "unfair advantage" to F::F::R -- the "shell-find pipe" approach took 7 sec on its first iteration, and less than 3 sec on each of the remaining nine iterations. But even with the OS caching already done, F::F::R still takes between 15 and 22 sec per iteration, and puts a much heavier load on the cpu. (This is with perl, v5.8.6 built for darwin-thread-multi-2level on macosx 10.4.7; I've seen similar results on freebsd and linux.)

If you aren't doing any really big directory trees, and/or you don't care how long it takes, using some version of File::Find is "good enough", but for serious work on a really large directory tree, it's worthwhile to take advantage of the perl's value as a "glue" language (to make efficient use of existing system resources), rather than taking advantage of these particular modules.


In reply to Re: What makes File::Find's interface so commonly hated by graff
in thread What makes File::Find's interface so commonly hated by demerphq

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.