springgem has asked for the wisdom of the Perl Monks concerning the following question:

I'm new to PerlMonks and fairly new to PERL. Long story short, I'm trying to clean up and speed up code and am struggling with Find::File::Rule. I read other posts here and on StackOverflow that helped, but I can't get to that last little bit.

The story: I want to build a list of directories or files, but exclude some based on patterns.

It started with:

find ( { no_chdir => 0, wanted => sub { return unless -d; # skip files; we're tagging folders return if $File::Find::name =~ /\/\./; return if $File::Find::name =~ /\./; return if $File::Find::name =~ /\.store$/i; return if $File::Find::name =~ /\/LOG(\/|$)/i; return if $File::Find::name =~ /\.go\./i; return if $File::Find::name =~ /\/cache(\/|$)/i; return if $File::Find::name =~ /\.store$/i; return if $File::Find::name =~ /\/AVCHD(\/|$)/i; push ( @fileList, $File::Find::name ); # get name with path }}, $tagLocation );

Hideous, but it worked. So I cleaned it up to:

our @dirExclusions = qw( \. \/LOG(\/|$) \/cache(\/|$) \/AVCHD(\/|$) ); # call regex only once our $dirExclusionsQR = join( '|', @dirExclusions ); # compile the expression $dirExclusionsQR = qr{$dirExclusionsQR}i; my @fileList3; find ( { no_chdir => 0, wanted => sub { return unless -d; # skip files; we're tagging folders return if $File::Find::name =~ $dirExclusionsQR; push ( @fileList3, $File::Find::name ); }}, $tagLocation );

Much better, and I get the benefit from qr//. Life is good. Now I want to make it more readable and perhaps faster with "File::Find::Rule" and ->name()->prune. So:

our @dirExclusions = qw( \. \/LOG(\/|$) \/cache(\/|$) \/AVCHD(\/|$) ); our @dirExclusionsQR = map { qr/$_/i } @dirExclusions; my $rule = File::Find::Rule->new; $rule->or( $rule->new->directory->name( @dirExclusionsQR )->prune->discard, $rule->new->directory); my @fileList2; @fileList2 = $rule->in($tagLocation);

This version doesn't return anything. I'll spare you the variants of putting qr inside @dirExclusions, using q() instead of qw(), taking off the '\/' (on the assumption File::Find::Rule wasn't matching on the full path), and such.

Next attempt: use a string, as in @fileList3 above.

our @dirExclusions = qw( \. \/LOG(\/|$) \/cache(\/|$) \/AVCHD(\/|$) ); our $dirExclusionsQR = join( '|', @dirExclusions ); $dirExclusionsQR = qr{$dirExclusionsQR}i; my $rule = File::Find::Rule->new; $rule->or( $rule->new->directory->name( $dirExclusionsQR )->prune->discard, $rule->new->directory); my @fileList2; @fileList2 = $rule->in($tagLocation);

Null output. How about typing it in directly?

$rule->or( $rule->new->directory ->name( qr/(\.|\/LOG(\/|$)|\/cache(\/|$)|\/AVCHD(\/|$))/i ) ->prune->discard, $rule->new->directory);

Doesn't work either.


Finally, out of desperation:

our @de = qw (*.* LOG cache AVCHD); my $rule = File::Find::Rule->new; $rule->or( $rule->new->directory->name( @de )->prune->discard, $rule->new->directory); my @fileList2; @fileList2 = $rule->in($tagLocation);

It does work, but I cannot use regex and I don't think the wildcards are compiled. And it doesn't solve my regex issue. What am I missing?

Thanks!

Replies are listed 'Best First'.
Re: Formatting Regex for File::Find::Rule
by haukex (Archbishop) on Apr 09, 2021 at 15:11 UTC
    $rule->new->directory->name( @dirExclusionsQR )->prune->discard

    ->prune means not to descend into that directory. Your fist match is ., so File::Find::Rule isn't even descending past the base directory. Second, note that ->name matches on the basename only, so the extra slashes you've got in there won't match. This works for me, hopefully it's what you're looking for:

    use warnings; use strict; use File::Find::Rule; my $tagLocation = '...'; my @dirExclusions = qw/ LOG cache AVCHD /; my $rule = File::Find::Rule->new; $rule->or( $rule->new->directory->name( @dirExclusions )->prune->discard, $rule->new->directory ); my @fileList = $rule->in($tagLocation);

      Thanks for the quick reply. Makes sense, though '*.*' does continue and returns files as expected. I'll be careful about '.' going forward. That's a wise point.

      Do you have any ideas as to why I can't get qr// to work inside name()?

        '*.*' does continue and returns files as expected

        '*.*' is being expanded to a regular expression by File::Find::Rule (more specifically, by glob_to_regex from Text::Glob): m{(?^:^(?=[^\.])(?:(?!\/).)*\.(?:(?!\/).)*$)}

        Do you have any ideas as to why I can't get qr// to work inside name()?

        ->name( qr/LOG|cache|AVCHD/ ) works for me the same as my array suggestion above - like I said it won't work if you keep the slashes in.