We set up some systems to copy selected configuration files to a documentation server nightly. Figuring out what those files should be proved to be complicated, and we wrapped it into a Perl program. The output of the program is fed into an awk command to do the actual copying/archiving; but the script itself became somewhat complicated.

So after thinking about it a couple days, I wrote an alternate Perl program to do the same thing. This attempts to be rules-based, and the rules should be easy to add or delete.

This is basically just a proof-of-concept.

There is an artifact here of the original solution: we had gathered regexps to match file/directory names as strings and converted them to regexps. This was so sysadmins not terribly comfortable with Perl regexps could add them somewhat intuitively.

#!/usr/bin/perl # # Sample script: create a list of rules and return the list of matchin +g files # MPeever # 2008-10-15 # # Revisions: # 2008-10-18 # - Added in exclude rules, expanded debug/verbose output # 2008-10-23 # - Made regexp changes suggested my jwkrahn on PerlMonks # - Put in temp variable $pattern in the regexp piece to eliminate # changing $_ in a map {} use strict; use warnings; use Data::Dumper; use File::Find; use File::Spec; die "Usage: $0 [-v|--verbose] [dir [dir...]]" unless scalar @ARGV; my $VERBOSE = scalar grep { $_ eq '-v' or $_ eq '--verbose' } @ARGV; my @dirs = grep { -d } @ARGV; die "No directories in: (", join (', ', @ARGV), ")" unless scalar @dir +s; ## # Use File::Find to find every file under any given directory # I've included the &wanted function as a lambda, since I only intend +to call # File::Find once my @files = (); find ( sub { my $n = $File::Find::name; if ($VERBOSE) { print STDERR '.' if -f $n; print STDERR '+' if -d $n; print STDERR '!' if -l $n; } push (@files, $n) if -f $n ; }, @dirs); ## # This isn't part of the core logic: we're just trying to make regexp # generation easy for admins with weak regex chops # The original use case used regexps to select files of interest. # Updated based on advice from PerlMonks my @name_patterns = qw { *.conf$ ^ho ^net }; my @name_regexps = map { (my $pattern = $_ ) =~ s/\./\\./g; $pattern =~ s/\?/(?s:.)/g; $pattern =~ s/\*/(?s:.*?)/g; qr {$pattern}; } @name_patterns; ## # @include_rules holds rules to select files # each $rule can access: # - $filename: short name of the file # - $fullpath: full path to file # - $parent: parent directory of the file # each $rule must return true or false ; # # I want to figure out how to do this without declaring $parent, $file +name, # and $fullpath globally my ($parent, $filename, $fullpath); my @include_rules = ( # Include everything in a "postfix" directory sub { $parent =~ /postfix/ }, # Include every file matching our @name_regexps sub { scalar grep { $filename =~ $_ } @name_regexps }, # Include every dir matching our @name_regexps sub { scalar grep { $parent =~ $_ } @name_regexps } ); ## # @exclude_rules is the same as @include_rules, except any file # matching ANY $rule is dropped from the list my @exclude_rules = ( # Exclude symlinks sub { -l $filename }, # Exclude files under 'racoon' sub { $parent =~ /racoon/ }, # Exclude Apache configuration sub { $parent =~ /(apache)|(httpd)/ }, # Exclude emacs backup files sub { $filename =~ /~$/ }, # Exclude stagnant files sub { -M $fullpath > 365 } ); ## # Debug output if ($VERBOSE) { print STDERR "\n\n"; print STDERR Dumper \@name_regexps, \@include_rules, \@exclude_rules + ; print STDERR "\n\n"; } ## # Apply each rule in @include_rules and @exclude_rules to every file # Keep that file if ANY $include_rule returns true # AND NO $exclude_rule returns true @files = grep { (my $v, $parent, $filename) = File::Spec->splitpath($_ +); $fullpath = $_; scalar grep { $_->() } @include_rules and not scalar grep { $_->() } @exclude_rules } @files; ## # Final action: print all the selected files print join("\n", sort @files), "\n\n" ;

Replies are listed 'Best First'.
Re: File finder
by jwkrahn (Abbot) on Oct 19, 2008 at 03:32 UTC
    ## # This isn't part of the core logic: we're just trying to make regexp # generation easy for admins with weak regex chops # The original use case used regexps to select files of interest. my @name_patterns = qw { *.conf$ ^ho ^net }; my @name_regexps = map { s/\./\\./g; s/\*/(\.+)?/g; qr {$_}; } @name_patterns;

    The  * glob pattern can match zero characters but you are replacing it with  (\.+)? which has to match at least one character.   Perhaps you want to use this instead:  s/\*/(?s:.*?)/g;   And don't forget the  ? glob pattern which must match one character:  s/\?/(?s:.)/g;

      You're absolutely right: the glob replacement is incorrect. Worse, it worked "close enough" that I managed to think I had done it right. I appreciate you pointing that out, I'll work on fixing that.

      To be perfectly honest, building regexps on the fly seemed clever at the time, but I'm not really sure it bought us that much. I was thinking in terms of typical shell usage, where *.conf refers to any file ending in ".conf". But the reality is, /\.conf$/ matches that already, with no need to explicitly name a wildcard. Additionally, I ended up using anchors in the strings, which is precisely what I was trying to avoid.

      Thanks again for the regex suggestions, I made them and they're exactly what I was going for.