regexp for directory

this is really not much more (and less) use than plain ol' grep, but i want to use it as the basis for a GUI code scanner/viewer. It is set to check only .pl files, and requires a regexp on the command-line to work. The output (linux) is exactly what my_nihilist describes below.

I believe McDarren, FunkyMonk, and even sort of ikegami helped me with their feedback to ranking number of occurances and Counting in regular expressions. One of several modifications courtesy of toolic, below, lead me to eliminate an unnecessary while loop (strange...) about line 22; i left the loop in comments for those who might also find it's unnecessity worth noting.

However, it did work just fine to begin with, toolic not withstanding. There is no planet on which i would post code that i hadn't tested.

Update #2: see my final note below, in which i take Fletch and toolic's advices about variable names and whitespace.

#!/usr/bin/perl -w
use strict;

my $regexp = $ARGV[0] || die "\nREQUIRES COMMAND-LINE ARGUMENT\n";
my $dir;
if (defined($ARGV[1])) {$dir = $ARGV[1]}
else {$dir = `pwd`}
chomp $dir;

print "\"$regexp\" in $dir/*.pl:\n";

my (%N_index, @files);

opendir(DIR, $dir) || die "can't open $dir";
@files = readdir(DIR);
closedir(DIR);

foreach my $pl (@files) {
        if ($pl =~ /\.pl$/) {
                my $content;
                open (PL, "<$dir/$pl") || die "can't open $dir/$pl"; 
#while (<PL>) {
                $content = do {local $/; <PL>};
#} that was the deleted "while loop"
                close (PL); my $N; 
                if ($content =~ /$regexp/) {
                        $_=$content;
                        $N =()= /$regexp/g;
                        $N_index{$pl}=$N;
                }   
        }   
}

my @rank = sort {$N_index{$b} <=> $N_index{$a}} keys %N_index;
foreach (@rank) {print "\t$N_index{$_} -- $_\n";}
[download]

Comment on regexp for directory Download Code

Replies are listed 'Best First'.
Re: regexp for directory by toolic (Bishop) on Mar 17, 2008 at 15:28 UTC
When I run this code, all I get is this warning: `> 674562.pl foo foo in /tmp/.pl: Use of uninitialized value in pattern match (m//) at 674562.pl line 26 +.` [download] Line 26 is: `if ($this =~ /$bit/) {` [download] Since your documentation is a little unclear, I am not sure what this code should do. My guess is that you want to imitate the unix grep* command, except that you want a count of all occurances of your regexp, rather than a count of all the lines on which the regexp occurs. If that is the case, I think `$this = do {local $/; <PL>};` ~~`if ($this =~ /$bit/)`~~ should not be inside the `while` loop. The entire file's contents are slurped into `$this` when you unset the input record separator, `$/`. Update: copy'n'pasted the wrong code. Thanks my_nihilist. Did you test this code yourself? I have some other critiques: Use meaningful variable names. These do not convey much meaning: `%hash, $c, $this, $bit` Always check the success of open and opendir. There is no need for these quotes: `"$dir"` Make better use of whitespace for clarity: `my @rank = sort {$hash{$b} <=> $hash{$a}} keys %hash;` [download] Many of these guidelines can be found in the book Perl Best Practices. It is a good investment.	[reply] [d/l] [select]
Re^2: regexp for directory by my_nihilist (Sexton) on Mar 17, 2008 at 16:36 UTC
I used this on linux and it works exactly as indicated, eg. "./test.p sort" produced: `sort in /home/me/perl/.pl: 6 -- big.II.pl 4 -- big.pl 1 -- sortoccurances.pl 1 -- example3.pl 1 -- test.pl` [download] Nb. toolic "if ($this =~ /$bit/)" never was inside the while loop!* I am sure that would be a problem. I am also sure from reading halfcountplus's other posts (ranking number of occurances) that slurping the entire file is intentional (how else could this work?). However, if i got an error, i would be suspicious too. Perhaps if you replace the "/" with a "\" (line 20)?	[reply] [d/l]
Re^2: regexp for directory by halfcountplus (Hermit) on Mar 17, 2008 at 19:55 UTC
thanx for your feedback! i still don't understand why it didn't work tho! the useful parts leave nothing inside while loop = no while loop! use "die" with open/opendir no need to quote "$dir" in open the ignored parts i think my variable names are more meaningful than most, actually. They are distinctly different from one another and they are short. What would you call "%hash", "%associativearraywithcountforeachfile->key"? There is only one hash, and %hash is it. "$this" appears 3 times across 6 lines...it could be "$filecontent" i guess but i use "this" and "that" like tweedledee uses tweedledum. Kind of. "this" and "that"; it's cute ;P whitespace, smightspace. what about the aesthetic value of having the last two lines the same length? Surely that contributes to readability, albeit "in a different sense". thanks again -- take care	[reply]
Re^3: regexp for directory by Fletch (Bishop) on Mar 17, 2008 at 21:13 UTC
You can get meaningful without hyperextraneoverbositude. `%matches_in_file` or `%count_for_file` are extremely descriptive without requiring me to read the entire goram piece of code to figure out what exactly is going in `%hash`. Any decent editor will also let you autocomplete the name after the first one or two times anyhow so the overall length of the name isn't an excuse. And if you're going to be lazy-cutesy using the default subject variable `$_` at least has the virtue of possibly shortening your code. Absolutely context free names like "this" and "that" just mean the maintenance programmer that follows n months hence is going to curse your crappy style, not praise your brevity and wit. Addendum: As to the lack of whitespace in the penultimate line, I'd just say it's people who write stuff like that in production code that give Perl the (somewhat deserved :) reputation for being executable line noise. Without reasonable whitespace you've got to scan back and forth to see where the breaks are (of course Mr. Maintenance programmer probably just learns to run anything you ever wrote through perltidy and tosses the originals away day one . . . ). The cake is a lie. The cake is a lie. The cake is a lie.	[reply] [d/l] [select]
Re^4: regexp for directory by halfcountplus (Hermit) on Mar 18, 2008 at 13:28 UTC

the useful parts

the ignored parts