Regex Basics

mbeason has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Brothers of Perl, I'm trying to create program that looks in a directory for a file with a specific pattern. If it doesn't find pattern X, it should print the preceding directory name which happens to be a server name. It should only print that server name once, no matter how many files are in the directory. My humble code follows:

my(@node_list) = glob("/appl/perform/workspace/htdocs/node_reports/*/*
+");
my($node) = "";
foreach $file (@node_list) {
    next if($file eq ".") or ($file eq "..") or ($file eq "lost+found"
+);
    (undef,undef,undef,undef,undef,undef,$node,$filename) = split /\//
+, $file;
    my($pattern) = "business_use\.$node";
    chomp($node); chomp($filename);
    unless($filename =~ /^($pattern)/) {
        %needs_desc = ("$node" =>  1);
        delete $needs_desc{$node} if("$filename" eq "$pattern");
        while ( ($key, $value) = each %needs_desc ) {
            print "$key => $value\n";
        }
    }
}
[download]

At this point, it will print the $node and "1" for every file in the directory. I had thought it would populate the hash and then remove the $key if the $pattern exists. What I ultimately hope to do is put this is a CSV file with just the $node name. Any guidance you can offer would be appreciated.

Comment on Regex Basics Download Code

Replies are listed 'Best First'.
Re: Regex Basics by Tanktalus (Canon) on Feb 07, 2007 at 20:16 UTC
Your description and your code are doing two very different things. Of course, that's why it's not doing what you want ;-) You're looking for all directories that do not have a file "business_use.$node" in them, right? So, why not just do exactly that? `use File::Basename; use File::Spec; # grab the dirs. my @dirs = grep { -d $_ } glob '/app/perform/workspace/htdocs/node_rep +orts/*'; # look for any dirs that do not have the desired file in them. my @missing_business_use_dirs = grep { # find the node name (it's the base name of the current directory) my $node = basename($_); # check if the desired file exists - keep the ones where it doesn't. not -e File::Spec->catfile($_, "business_use.$node"); } @dirs` [download] Now @missing_business_use_dirs has your list of nodes.	[reply] [d/l]
Re: Regex Basics by GrandFather (Saint) on Feb 07, 2007 at 20:29 UTC
The following (untested) code should be closer to what you want: `use strict; use warnings; my @node_list = glob("/appl/perform/workspace/htdocs/node_reports//" +); my $node = ""; my %needs_desc; foreach my $file (@node_list) { next if $file =~ /^(?:\.\.? \| lost\+found)$/x; my ($node, $filename) = (split /\//, $file)[6,7]; my $pattern = "business_use\.$node"; $needs_desc{$node} = 0; ++$needs_desc{$node} if $filename =~ /^$pattern/; } for my $node (sort keys %needs_desc) { print "$node\n" if !$needs_desc{$node}; }` [download] You really need to visit the Tutorials section and do a little reading about hashes in particular. Note that I've fixed a number of errors and refactored some of the code somewhat, in large part to cut away cruft and make the solution to the actual problem clearer (I hope). DWIM is Perl's answer to Gödel	[reply] [d/l]
Re: Regex Basics by graff (Chancellor) on Feb 08, 2007 at 04:03 UTC
What did you expect this line of your code to do? `chomp($node); chomp($filename);` [download] If you read the output of 'perldoc -f chomp', you will see that in effect it does `s{ \Q $/ \E $ }{}x` on each of the args that you give it; that is, if a string ends with the character (sequence) that is the current value of the global INPUT_RECORD_SEPARATOR variable, then chomp removes that character (sequence) from the end of the string. When you read file names from a directory via glob() or readdir() (or File::Find), they do not come with newlines at the end of each name -- directory entries normally do not contain linefeed or CRLF as part of the file name. Apart from that, you seem to be "populating the hash" from scratch, loading exactly one element into the hash each time, at every iteration of the foreach loop -- the hash never has more than one element in it: `%needs_desc = ("$node" => 1);` [download] and then you're printing that one element if the filename happens to differ from your target pattern, also at every iteration of the foreach loop. I think the other replies above have already pointed you in a better direction.	[reply] [d/l] [select]
Re: Regex Basics by Moron (Curate) on Feb 08, 2007 at 17:24 UTC
- appearing to be contrary to the first line of the code, the node-named directories are one level higher than where the ordinary files for the node are located. - as a good habit, follow glob with an expression that includes the '' in single quotes -- "" is prone to be dereferenced by Perl before the glob() gets called. - grep and glob could be combined more succinctly and therefore readably - sometimes opendir etc fits better than glob in the code - in this case both seem to have their place. - where a path is used more than once in the code, put it in a variable early on to avoid data duplication (maintainability). Consider also putting paths, regexps and filenames in configuration files to avoid hardcoding completely. For example ... `my $tree = "/appl/perform/workspace/htdocs/node_reports"; opendir my $dh, $tree or die $!; for my $node ( grep !( /^\./ \|\| /^lost\+found$/), readdir $dh ) { print "$_\n" for grep !/^business_use\.$node$/, glob join( '/', $t +ree, $node, '' ); } closedir $dh;` [download] -M Free your mind*	[reply] [d/l]