nemesdani has asked for the wisdom of the Perl Monks concerning the following question:

Laudetur, monks.
I'm using File::find to find certain files. The directory tree is huge, so I need to use the preprocess possibility. I'd like to filter out
  • All files that don't end with .log
  • All directories that have "advanced" in their names.
  • However, the preprocess returns everything, so obviously I'm doing something wrong. Please point out my mistakes. Thanks.
    Code part:
    find ({ preprocess => \&preprocess, wanted => \&wanted }, $mypath); sub preprocess { my @toreturn = (); if (-f && /\.log$/ ) {push (@toreturn, $_);} elsif (-d && /advanced/ ) {push (@toreturn, $_);} return @toreturn; } sub wanted { if ($File::Find::dir =~ /$envName$/) { checkLastLine(); } else {print "not same env: $File::Find::name\n";} }

    I'm too lazy to be proud of being impatient.

    Replies are listed 'Best First'.
    Re: File::find preprocess problem
    by Anonymous Monk on Apr 25, 2012 at 10:55 UTC

      However, the preprocess returns everything,

      That is kinda what you tell it to do. elsif (-d && /advanced/ ) {push (@toreturn, $_);} means "hey, I've found one I want to ignore, but I choose to NOT ignore , aren't I funny?"

      so I need to use the preprocess possibility

      Not exactly :)

      use File::Find::Rule

      use File::Find::Rule; my $rule = File::Find::Rule->new; $rule->or( ## first rule, things to ignore, to skip $rule->new ->directory ->name('CVS', qr/advanced/i ) ->prune ->discard, $rule->new ->file ->name('*.log') ); my @files = $rule->in( @startdirs );

      or use Path::Class::Rule , it doesn't use File::Find underneath, and actually names the prune/discard option skip, and provides a real iterator , its brilliant

      use Path::Class::Rule; my $rule = Path::Class::Rule->new; $rule->skip( $rule->new->dir->name(qr/advanced/i), $rule->new->skip_vcs ); $rule->file->name('*.log'); # iterator interface my $next = $rule->iter( @dirs ); while ( my $file = $next->() ) { ... }
    Re: File::find preprocess problem
    by zentara (Cardinal) on Apr 25, 2012 at 11:07 UTC
      the preprocess returns everything, so obviously I'm doing something wrong.

      I don't have a directory tree to test on, but I think you are using preprocess in the wrong manner by returning a set of files. Preprocess should work by returning empty, unless some condition is met. Some pseudocode: ( my logic is probably not syntactically correct, but it shows the idea) :-)

      sub preprocess{ if( $File::Find::dir =~ m/advanced/ ){ return $File::Find::prune = 1 } return unless -f $File::Find::name; return unless $File::Find::name =~ m/\.log/; }

      I'm not really a human, but I play one on earth.
      Old Perl Programmer Haiku ................... flash japh
        Preprocess should work by returning empty, unless some condition is met. This is not right.

        I've played a bit with this option before. When File::Find enters a new directory, it does a readdir(). The output of that readdir() is what goes into the preprocess routine. What you return is a) filtered version of that or perhaps even b) a sorted version of that. If you take a directory name out of this list, find() will not follow down that path (useful for pruning off a directory branch).

        I'm not sure that using the preprocess option will make any significant difference in performance in this case. Depends upon how many files are in the /advanced/ directories.

        As a trick, there is a special variable _, (note not $_). When a file test is done, this causes a stat(), in this case using the _ will cause the file test info from the -f test to be re-used. This will make a performance difference - stat() is not a quick operation. There are some "yeah but's concerning various types of links - I forget the details right now, but usually this is not an issue.

        Didn't test this, but I think this will work...if I got my unless logic right.

        Update: Ooops, looked again it appears the the below should be changed, OP wants to process .log files and follow all directories that aren't "advanced".

        sub preprocess { my @to_return; foreach (@_) { #don't follow down advanced directory paths #Do call wanted() on .log files (and any directory not #underneath an "advanced" one push @to_return, $_ if ( ( -f and /\.log$/ ) or !( -d _ and /advanced/) ); } return @to_return; }
    Re: File::find preprocess problem
    by Anonymous Monk on Apr 25, 2012 at 10:56 UTC
      I find declarative/rule-based programming with Path::Class::Rule better. F::F has the worst interface of all file finding modules.
      use 5.010; use strictures; use Path::Class::Rule qw(); my $next = Path::Class::Rule ->new ->skip_dirs(qr/advanced/) # should come before file tests for effi +ciency ->name(qr/[.]log\z/) ->iter('.') # starting dir(s) ; while (my $file = $next->()) { say $file; }