justinjoseph24 has asked for the wisdom of the Perl Monks concerning the following question:

Monks: In Perl module documentation, http://perldoc.perl.org/File/Find.html, we have:

"When follow or follow_fast are in effect, preprocess is a no-op."

I wrote a search script that uses preprocess to improve performance, but I want to use it in an environment that uses symlinking heavily.

Is there any way around this?

Code:

#!/usr/bin/perl -w use strict; use File::Find; use Getopt::Long; use MLMS::Common qw ( shiftx ); my ( $retval, $start_hour, $end_hour, $dir, $cutNum, $ext, $excl, $pre +fix, $help ); our ( @exc ); $retval = &GetOptions("start_hour", \$start_hour, "end_hour", \$end_hour, "dirs", \$dir, "cutNum", \$cutNum, "ext", \$ext, "excl", \$excl, "prefix=s",\$prefix, "help", \$help ); if (!$start_hour || !$end_hour || !$dir || !$cutNum || !$ext) { usage(); exit(3); } if ( ! $excl ) { $excl=' '; } else { @exc=split /,/,$excl; } $start_hour=shift; $end_hour=shift; $dir=shift; $cutNum=shift; $ext=shift; unless ( $start_hour=~/^[0-9]+$/ && $end_hour=~/^[0-9]+$/ && $cutNum=~ +/^[0-9]+$/ ) { print "Not a number.\n"; usage(); } unless ( -e $dir ) { print "Not a directory.\n"; usage(); } my @exts=split /,/,$ext; my $time=time(); my $hour=3600; my $begin_offset=$time-$start_hour*$hour; my $end_offset=$time-$end_hour*$hour; my @starting_directories = ("$dir"); my ($callback, $yield) = create_find_callback_that_compares($begin_off +set,$end_offset,\@exc); find( { wanted => $callback, preprocess => \&exclude_dirs, }, @starting_directories ); my @files = $yield->( ); foreach my $changed ( @files ) { chomp($changed); foreach my $kind ( @exts ) { chomp($kind); if ( $changed=~m/$kind$/ ) { my @dirs = File::Spec->splitdir( $changed ); shiftx(\@dirs,$cutNum); chomp(my $file = File::Spec->catdir( @dirs )); $file=~s/\\/\//g; if ( $prefix ) { print $prefix.'/'.$file."\n"; } else { print '/'.$file."\n"; } } } } sub create_find_callback_that_compares { my $begin_offset=shift; my $end_offset=shift; my $count = 0; my $excl=shift; ( sub { return unless -e $_; my $fage= (stat $_)[9]; if ( ! define +d $fage ) { print "File age not defined for $File::Find::name\n"; } p +ush (@files, $File::Find::name) if $fage >= $begin_offset && $fage <= + $end_offset; }, sub { @files; } ) } sub exclude_dirs { use Data::Dumper; my @inlist = @_; my @outlist; F: foreach (@inlist) { chomp(my $wanteded=$_); $wanteded=~s/\s+//g; foreach ( @exc ) { chomp(my $notwanteded=$_); $notwanteded=~s/\s+//g; next F if $wanteded eq $notwanteded; } push ( @outlist, $wanteded ); } return @outlist; }

Replies are listed 'Best First'.
Re: File::Find wanted and preprocess together
by andal (Hermit) on Nov 03, 2010 at 08:54 UTC

    And why can't you do the exclusion inside of "wanted" function?

      Not sure I understand what you mean.

      In the File::Find manpage, the preprocess routine is called before wanted. Are you implying that I could be using preprocess routine within the wanted routine? If so, could you please give an example?

        You don't use "bydepth" option. Which means that you can cancel processing of any subdirectory by setting $File::Find::prune=1. So, in your "wanted" function check if the file should be excluded from the processing and then set $File::Find::prune for the directories that should be excluded and simply ignore the files that should be excluded. Something like

        sub my_wanted{ if(exists $skip_them{$_}) { $File::Find::Prune = 1 if -d $_; return; } do_process($_); }