File finder

We set up some systems to copy selected configuration files to a documentation server nightly. Figuring out what those files should be proved to be complicated, and we wrapped it into a Perl program. The output of the program is fed into an awk command to do the actual copying/archiving; but the script itself became somewhat complicated.

So after thinking about it a couple days, I wrote an alternate Perl program to do the same thing. This attempts to be rules-based, and the rules should be easy to add or delete.

This is basically just a proof-of-concept.

There is an artifact here of the original solution: we had gathered regexps to match file/directory names as strings and converted them to regexps. This was so sysadmins not terribly comfortable with Perl regexps could add them somewhat intuitively.

#!/usr/bin/perl

#
# Sample script: create a list of rules and return the list of matchin
+g files
# MPeever 
# 2008-10-15
#
# Revisions:
# 2008-10-18 
# - Added in exclude rules, expanded debug/verbose output
# 2008-10-23 
# - Made regexp changes suggested my jwkrahn on PerlMonks
# - Put in temp variable $pattern in the regexp piece to  eliminate 
#   changing $_ in a map {}

use strict;
use warnings;
use Data::Dumper;
use File::Find;
use File::Spec;

die "Usage: $0 [-v|--verbose] [dir [dir...]]" unless scalar @ARGV;
my $VERBOSE = scalar grep { $_ eq '-v' or $_ eq '--verbose' } @ARGV;

my @dirs = grep { -d } @ARGV;
die "No directories in: (", join (', ', @ARGV), ")" unless scalar @dir
+s; 

##
# Use File::Find to find every file under any given directory
# I've included the &wanted function as a lambda, since I only intend 
+to call
# File::Find once
my @files = ();
find ( sub { my $n = $File::Find::name;
         if ($VERBOSE) {
           print STDERR '.' if -f $n;
           print STDERR '+' if -d $n;
           print STDERR '!' if -l $n;
         }
         push (@files, $n) if -f $n ; }, @dirs);

## 
# This isn't part of the core logic: we're just trying to make regexp
# generation easy for admins with weak regex chops
# The original use case used regexps to select files of interest.
# Updated based on advice from PerlMonks
my @name_patterns = qw { *.conf$ ^ho ^net };
my @name_regexps = map { (my $pattern = $_ ) =~ s/\./\\./g;
             $pattern =~ s/\?/(?s:.)/g;
             $pattern =~ s/\*/(?s:.*?)/g;
             qr {$pattern}; } @name_patterns;

##
# @include_rules holds rules to select files
# each $rule can access:
# - $filename: short name of the file
# - $fullpath: full path to file
# - $parent: parent directory of the file
# each $rule must return true or false ;
#
# I want to figure out how to do this without declaring $parent, $file
+name,
# and $fullpath globally
my ($parent, $filename, $fullpath);
my @include_rules = (
             # Include everything in a "postfix" directory
             sub { $parent =~ /postfix/ },
             # Include every file matching our @name_regexps
             sub { scalar grep { $filename =~ $_ } @name_regexps },
             # Include every dir matching our @name_regexps
             sub { scalar grep { $parent =~ $_ } @name_regexps }
            );

##
# @exclude_rules is the same as @include_rules, except any file
# matching ANY $rule is dropped from the list
my @exclude_rules = (
             # Exclude symlinks
             sub { -l $filename },
             # Exclude files under 'racoon'
             sub { $parent =~ /racoon/ },
             # Exclude Apache configuration
             sub { $parent =~ /(apache)|(httpd)/ },
             # Exclude emacs backup files
             sub { $filename =~ /~$/ },
             # Exclude stagnant files
             sub { -M $fullpath > 365 }
             );

##
# Debug output
if ($VERBOSE) {
  print STDERR "\n\n";
  print STDERR Dumper \@name_regexps, \@include_rules, \@exclude_rules
+ ;
  print STDERR "\n\n";
}

##
# Apply each rule in @include_rules  and @exclude_rules to every file 
# Keep that file if ANY $include_rule returns true 
# AND NO $exclude_rule returns true
@files = grep { (my $v, $parent, $filename) = File::Spec->splitpath($_
+);
        $fullpath = $_;
        scalar grep { $_->() } @include_rules 
          and not scalar grep { $_->() } @exclude_rules } @files;

## 
# Final action: print all the selected files
print join("\n", sort @files), "\n\n" ;
[download]

Comment on File finder Download Code

Replies are listed 'Best First'.
Re: File finder by jwkrahn (Abbot) on Oct 19, 2008 at 03:32 UTC
`## # This isn't part of the core logic: we're just trying to make regexp # generation easy for admins with weak regex chops # The original use case used regexps to select files of interest. my @name_patterns = qw { .conf$ ^ho ^net }; my @name_regexps = map { s/\./\\./g; s/\/(\.+)?/g; qr {$_}; } @name_patterns;` [download] The `` glob pattern can match zero* characters but you are replacing it with `(\.+)?` which has to match at least one character. Perhaps you want to use this instead: `s/\/(?s:.?)/g;` And don't forget the `?` glob pattern which must match one character: `s/\?/(?s:.)/g;`	[reply] [d/l] [select]
Re^2: File finder by mpeever (Friar) on Oct 19, 2008 at 15:37 UTC
You're absolutely right: the glob replacement is incorrect. Worse, it worked "close enough" that I managed to think I had done it right. I appreciate you pointing that out, I'll work on fixing that. To be perfectly honest, building regexps on the fly seemed clever at the time, but I'm not really sure it bought us that much. I was thinking in terms of typical shell usage, where `*.conf` refers to any file ending in ".conf". But the reality is, `/\.conf$/` matches that already, with no need to explicitly name a wildcard. Additionally, I ended up using anchors in the strings, which is precisely what I was trying to avoid.	[reply] [d/l] [select]
Re^2: File finder by mpeever (Friar) on Oct 23, 2008 at 23:30 UTC
Thanks again for the regex suggestions, I made them and they're exactly what I was going for.	[reply]