Re^2: using hash to find frequency count

Thanks, ikegami. Your comments made the code much more readable. I don't fully understand what the matching is doing though; it does not appear to be splitting the line on delimiters, and I am unclear on the fuction of the parens.
I made the repairs as you suggest; my new code is still lacking something
C:\scripts>wordcount.pl alice.txt
distinct words: 0
frequency of most common word:
common word:

use strict;

my $maxcount;
my $find;
my $file;
my %hash;
my $count;

while(<>){
  my @list = shift =~ /([a-xA-Z'\-]+)/g;
  foreach my $word (@list) {
    $count =++$hash{lc $word};
      if ($count > $maxcount) {
        $maxcount = $count;
      }
    }
}

my $numwords= keys %hash;
print "distinct words: $numwords\n";
print "frequency of most common word: $maxcount\n";
print "common word: $find";
[download]

Comment on Re^2: using hash to find frequency count Download Code

Replies are listed 'Best First'.
Re^3: using hash to find frequency count by jjohhn (Scribe) on May 15, 2005 at 04:07 UTC
Changing `while(<>){ my @list = shift =~ /([a-xA-Z'\-]+)/g;` [download] to `while(<>){ my @list = $_ =~ /([a-xA-Z'\-]+)/g;` [download] gave me results now; could you explain a little what the match is doing to parse the lines? C:\scripts>wordcount.pl alice.txt distinct words: 2815 frequency of most common word: 1779 common word:	[reply] [d/l] [select]
Re^4: using hash to find frequency count by jjohhn (Scribe) on May 15, 2005 at 04:23 UTC
The answer, not surprisingly, is "the". The match was the key, as ikegami said. I would greatly appreciate a hint about how it is producing an appropriate list of words to count. This improved approach does not appear to parse a string on delimiters, but to alter the value of $_.	[reply]
Re^4: using hash to find frequency count by ikegami (Patriarch) on May 15, 2005 at 13:44 UTC
could you explain a little what the match is doing to parse the lines? It says: Match a "word", defined as a sequence of one or more letters, hyphens and apostrophes (`[...]+`). When you find that, return it (`()`). Repeat (`/g`). That definition of a word is rather primitve, and may need to be tweaked. use strict; my $maxcount; my $find; my $file; my %hash; my $count; while (<>) { while (/([a-zA-Z'\-]+)/g) { my $word = $1; $count = ++$hash{lc $word}; if ($count > $maxcount) { $find = $word; $maxcount = $count; } } } my $numwords = keys %hash; print "distinct words: $numwords\n"; print "frequency of most common word: $maxcount\n"; print "common word: $find"; __END__ output of perl script.pl script.pl ================================== distinct words: 25 frequency of most common word: 7 common word: my [download]	[reply] [d/l] [select]