File::Find wanted and preprocess together

justinjoseph24 has asked for the wisdom of the Perl Monks concerning the following question:

Monks: In Perl module documentation, http://perldoc.perl.org/File/Find.html, we have:

"When follow or follow_fast are in effect, preprocess is a no-op."

I wrote a search script that uses preprocess to improve performance, but I want to use it in an environment that uses symlinking heavily.

Is there any way around this?

Code:

#!/usr/bin/perl -w
use strict;
use File::Find;
use Getopt::Long;
use MLMS::Common qw ( shiftx );

my ( $retval, $start_hour, $end_hour, $dir, $cutNum, $ext, $excl, $pre
+fix, $help );
our ( @exc );

$retval = &GetOptions("start_hour", \$start_hour,
              "end_hour", \$end_hour,
              "dirs", \$dir,
              "cutNum", \$cutNum,
              "ext", \$ext,
              "excl", \$excl,
              "prefix=s",\$prefix,
                      "help", \$help
                      );

if (!$start_hour || !$end_hour || !$dir || !$cutNum || !$ext) {
    usage();
    exit(3);
}

if ( ! $excl ) {
    $excl=' ';    
}
else {
    @exc=split /,/,$excl;    
}

$start_hour=shift;
$end_hour=shift;
$dir=shift;
$cutNum=shift;
$ext=shift;

unless ( $start_hour=~/^[0-9]+$/ && $end_hour=~/^[0-9]+$/ && $cutNum=~
+/^[0-9]+$/ ) {
    print "Not a number.\n";
    usage();    
}

unless ( -e $dir ) {
    print "Not a directory.\n";
    usage();
}

my @exts=split /,/,$ext;

my $time=time();
my $hour=3600;

my $begin_offset=$time-$start_hour*$hour;
my $end_offset=$time-$end_hour*$hour;

my @starting_directories = ("$dir");

my ($callback, $yield) = create_find_callback_that_compares($begin_off
+set,$end_offset,\@exc);
find( { wanted => $callback,
    preprocess => \&exclude_dirs,
    }, @starting_directories );
my @files = $yield->( );

foreach my $changed ( @files ) {
    chomp($changed);
    foreach my $kind ( @exts ) {
        chomp($kind);
        if ( $changed=~m/$kind$/ ) {
            my @dirs = File::Spec->splitdir( $changed );
            shiftx(\@dirs,$cutNum); 
            chomp(my $file = File::Spec->catdir( @dirs ));
            $file=~s/\\/\//g;
            if ( $prefix ) {
                print $prefix.'/'.$file."\n";
            }
            else {
            print '/'.$file."\n";
            }
        }
    }
}


sub create_find_callback_that_compares {
    my $begin_offset=shift;
    my $end_offset=shift;
    my $count = 0;
    my $excl=shift;
    ( sub { return unless -e $_; my $fage= (stat $_)[9]; if ( ! define
+d $fage ) { print "File age not defined for $File::Find::name\n"; } p
+ush (@files, $File::Find::name) if $fage >= $begin_offset && $fage <=
+ $end_offset; }, sub { @files; } ) 
}

sub exclude_dirs {
    use Data::Dumper;
    my @inlist = @_;
    my @outlist;
    F: foreach (@inlist) {
        chomp(my $wanteded=$_);
        $wanteded=~s/\s+//g;
        foreach ( @exc ) {
            chomp(my $notwanteded=$_);
            $notwanteded=~s/\s+//g;
            next F if $wanteded eq $notwanteded;
        }
        push ( @outlist, $wanteded );
    }
 return @outlist;
}
[download]

Comment on File::Find wanted and preprocess together Download Code

Replies are listed 'Best First'.
Re: File::Find wanted and preprocess together by andal (Hermit) on Nov 03, 2010 at 08:54 UTC
And why can't you do the exclusion inside of "wanted" function?	[reply]
Re^2: File::Find wanted and preprocess together by justinjoseph24 (Initiate) on Nov 03, 2010 at 17:04 UTC
Not sure I understand what you mean. In the File::Find manpage, the preprocess routine is called before wanted. Are you implying that I could be using preprocess routine within the wanted routine? If so, could you please give an example?	[reply]
Re^3: File::Find wanted and preprocess together by andal (Hermit) on Nov 04, 2010 at 09:54 UTC
You don't use "bydepth" option. Which means that you can cancel processing of any subdirectory by setting $File::Find::prune=1. So, in your "wanted" function check if the file should be excluded from the processing and then set $File::Find::prune for the directories that should be excluded and simply ignore the files that should be excluded. Something like `sub my_wanted{ if(exists $skip_them{$_}) { $File::Find::Prune = 1 if -d $_; return; } do_process($_); }` [download]	[reply] [d/l]
Re^4: File::Find wanted and preprocess together by justinjoseph24 (Initiate) on Nov 09, 2010 at 19:54 UTC