ranking number of occurances

halfcountplus has asked for the wisdom of the Perl Monks concerning the following question:

I want to compare the number of times a string occurs in a set of files. I read each file into a single line using:


my $all = do {local $/; <IN>};
[download]

Then i got the number of instances using McDarren's code from Counting in regular expressions and did this:


#!/usr/bin/perl -w
use strict;

my $bit="ace"; my (%hash);
while (<DATA>) {
        if ($_ =~ /$bit/) {
                my @ray=($_ =~ /$bit/g); my $c=scalar @ray;
                $hash{$_}=$c;
        }   
}

my @rank=sort{$hash{$b}<=>$hash{$a}}keys%hash;
print @rank;

__DATA__
In the wire game, a "mob" composed of dozens of 
grifters simulates a "wire store", i.e., a place 
where results from horse races are received by 
telegram and posted on a large board, while also 
being read aloud by an announcer. The griftee is 
given secret foreknowledge of the race results 
minutes before the race is broadcast, and is 
therefore able to place a sure bet at the wire store. 
In reality, of course, the con artists who set up 
the wire store are the providers of the inside 
information, and the mark eventually is led to place 
a large bet, thinking it to be a sure win. At this 
point, some mistake is made, which actually makes the bet a loss.
[download]

Is there a less troublesome way to do this? Even better, is there a way to correspond a hash key to an array value without "whiling" thru -- so i could invoke the number of occurances (from %hash) to the sorted array value (which would equal the name of a hash key)?

Comment on ranking number of occurances Select or Download Code

Replies are listed 'Best First'.
Re: ranking number of occurances by ikegami (Patriarch) on Mar 16, 2008 at 22:37 UTC
`while (<DATA>) { my $c = () = /\Q$bit\E/g; $hash{$_} = $c if $c; } my @rank = sort{ $hash{$b} <=> $hash{$a} } keys %hash; my $to_print = 10; $to_print = @rank if @rank < 10; print "$_: $hash{$_}\n" for @rank[0..$to_print-1];` [download] Avoided intermediary array. Converted $bit from plain text to a regexp using \Q..\E. Produced better output. If there are two (or more) identical lines, they will only appear once in the hash, but that doesn't look like a problem. Update: I just noticed there's a question at the bottom. But I don't understand what you're asking anyway.	[reply] [d/l]
Re: ranking number of occurances by FunkyMonk (Bishop) on Mar 16, 2008 at 22:45 UTC
If you really want to compare the number of times a string occurs in a set of files as opposed to lines in which the pattern occurred, you'll need to add to the hash's value: `$hash{$_} += $c;` [download] for your code, or `$hash{$_} += $c if $c;` [download] for ikegami's	[reply] [d/l] [select]
Re^2: ranking number of occurances by ikegami (Patriarch) on Mar 16, 2008 at 23:42 UTC
I don't think so. Your code says that line/file "a48754a4397543a43753a" contains "a" 8 times. Even if $_ represents a file instead of a line, no change is needed. If he wants the total, then he'd need `$hash{$_} = $c; # Per line/file count $total += $c; # Total count` [download]	[reply] [d/l]
Re: ranking number of occurances by grizzley (Chaplain) on Mar 17, 2008 at 07:51 UTC
I would do this in such a way: `#!/usr/bin/perl -w use strict; my $bit="ace"; my (%hash); while (<DATA>) { while($_ =~ /$bit/g) { $hash{$_}++; } }` [download] And then you can put it into an array, where on position i is a reference to array of lines having i occurences of $bit: `my $bit="ace"; my (%hash, @rank); while (<DATA>) { while($_ =~ /$bit/g) { $hash{$_}++; } push @{$rank[$hash{$_}]}, $_; } # see how the structure looks like # use Data::Dumper; # print Dumper \@rank; for(@rank) { for(@$_) { print; } }` [download]	[reply] [d/l] [select]