comment on

I agree with the comments in the previous replies, and would add that there is also File::Finder, which provides something more like the command-line interface of the common unix "find" utility; like File::Find::Rule, this "amendment" to File::Find makes it a lot easier to come up with working code.

But both the ::Finder and ::Rule extensions are just wrappers around the core File::Find module, and all three end up suffering from the same problem relative to using the basic "find" utility -- they are much slower, and this is the main reason why I hate File::Find and anything based on it.

I'd much rather open a pipeline file handle running the "find" command: this utility is either native or freely available for all common OS's, it's pretty easy to use in a perl script via the file handle idiom, and it runs a lot faster -- typically a by factor of six in wallclock time.

I posted a benchmark on File::Find four years ago, and another on File::Finder two years ago, so here's a new one for File::Find::Rule (using an example from the module's man page). All of these benchmarks show pretty much the same timing difference between the module and the system "find" utility.

#!/usr/bin/perl

use strict;
use Benchmark;
use File::Find::Rule;

( @ARGV == 1 and -d $ARGV[0] )
    or die "Usage: $0 some/path\n";

print "started at ", scalar localtime, $/;
timethese( 10, {
        'Shell-find pipe' => \&try_pipe,
        'file::Find::Rule' => \&try_ffr,
           });

sub try_ffr {
    my @f = File::Find::Rule->file()->name( '*.pm' )->in( $ARGV[0] );
    print scalar @f, " .pm files found under $ARGV[0] at ", scalar loc
+altime, $/;
}

sub try_pipe {
    open( FIND, "find $ARGV[0] -name '*.pm' |" );
    my @f = <FIND>;
    print scalar @f, " .pm files found under $ARGV[0] at ", scalar loc
+altime, $/;
}

__END__

# sample run:

$ ffr-bm.pl /usr      
started at Sun Jul 16 23:48:33 2006
Benchmark: timing 10 iterations of Shell-find pipe, file::Find::Rule..
+.
481 .pm files found under /usr at Sun Jul 16 23:48:41 2006
481 .pm files found under /usr at Sun Jul 16 23:48:44 2006
481 .pm files found under /usr at Sun Jul 16 23:48:46 2006
481 .pm files found under /usr at Sun Jul 16 23:48:48 2006
481 .pm files found under /usr at Sun Jul 16 23:48:50 2006
481 .pm files found under /usr at Sun Jul 16 23:48:52 2006
481 .pm files found under /usr at Sun Jul 16 23:48:53 2006
481 .pm files found under /usr at Sun Jul 16 23:48:55 2006
481 .pm files found under /usr at Sun Jul 16 23:48:57 2006
481 .pm files found under /usr at Sun Jul 16 23:48:59 2006
Shell-find pipe: 26 wallclock secs ( 0.03 usr  0.04 sys +  8.73 cusr  
+5.79 csys = 14.59 CPU) @ 142.86/s (n=10)
481 .pm files found under /usr at Sun Jul 16 23:49:19 2006
481 .pm files found under /usr at Sun Jul 16 23:49:41 2006
481 .pm files found under /usr at Sun Jul 16 23:49:59 2006
481 .pm files found under /usr at Sun Jul 16 23:50:14 2006
481 .pm files found under /usr at Sun Jul 16 23:50:29 2006
481 .pm files found under /usr at Sun Jul 16 23:50:44 2006
481 .pm files found under /usr at Sun Jul 16 23:51:02 2006
481 .pm files found under /usr at Sun Jul 16 23:51:24 2006
481 .pm files found under /usr at Sun Jul 16 23:51:42 2006
481 .pm files found under /usr at Sun Jul 16 23:51:57 2006
file::Find::Rule: 178 wallclock secs (33.39 usr + 53.05 sys = 86.44 CP
+U) @  0.12/s (n=10)
[download]

The output shows that the OS's own caching behavior gives an "unfair advantage" to F::F::R -- the "shell-find pipe" approach took 7 sec on its first iteration, and less than 3 sec on each of the remaining nine iterations. But even with the OS caching already done, F::F::R still takes between 15 and 22 sec per iteration, and puts a much heavier load on the cpu. (This is with perl, v5.8.6 built for darwin-thread-multi-2level on macosx 10.4.7; I've seen similar results on freebsd and linux.)

If you aren't doing any really big directory trees, and/or you don't care how long it takes, using some version of File::Find is "good enough", but for serious work on a really large directory tree, it's worthwhile to take advantage of the perl's value as a "glue" language (to make efficient use of existing system resources), rather than taking advantage of these particular modules.

In reply to Re: What makes File::Find's interface so commonly hated by graff
in thread What makes File::Find's interface so commonly hated by demerphq

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.