Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Greetings,
I've been trying to put a word frequency counter together but I keep getting an unitialised value in pattern match.
my $file = "C:\\areopagitica.txt"; open(IN, $file) || die "File not found"; my @thisfile = <IN>; close(IN); chomp @thisfile; my %seen=(); while (@thisfile) { while ( /(\w['\w-]*)/g ) { $seen{lc $1}++; } } foreach my $word (sort { $seen{b} <=> $seen{a} } keys %seen) { printf "%5d %s\n", $seen{$word}, $word; }
I'd be grateful if somebody could let me know what I'm dong wrong.

Replies are listed 'Best First'.
Re: Word Frequency counter
by GrandFather (Saint) on Oct 02, 2008 at 20:30 UTC

    For a start

    while (@thisfile) {

    doesn't do what you think. In particular, it is not

    for (@thisfile) {

    Next:

    $seen{b} <=> $seen{a}

    compares the values for keys 'a' and 'b', not for the two sort variables' ($a and $b) contents as you are hoping. Cleaning those problems up, fixing a few other style issues and providing some sample data gives:

    use strict; use warnings; my $fileData = <<DATA; Greetings, I've been trying to put a word frequency counter together but I keep g +etting an unitialised value in pattern match. I'd be grateful if somebody could +let me know what I'm dong wrong. I added a line to get some repeated words. DATA my %seen; open my $inFile, '<', \$fileData; for (grep {chomp; length} <$inFile>) { $seen{lc $1}++ while /(\w['\w-]*)/g; } close ($inFile); printf "%5d %s\n", $seen{$_}, $_ for sort { $seen{$b} <=> $seen{$a} } +keys %seen;

    Prints:

    2 a 2 to 2 i 1 i've 1 know 1 put 1 if 1 unitialised 1 greetings 1 i'd 1 frequency 1 wrong 1 let 1 could 1 in 1 keep 1 line 1 repeated 1 trying 1 what 1 value 1 me 1 match 1 grateful 1 i'm 1 word 1 be 1 some 1 somebody 1 but 1 added 1 words 1 dong 1 been 1 get 1 together 1 getting 1 pattern 1 counter 1 an

    Perl reduces RSI - it saves typing
      Thanks for all the above replies which explain where I've gone wrong and why and especially Grandfather for showing a different and less verbose way of getting the task working.
Re: Word Frequency counter
by Fletch (Bishop) on Oct 02, 2008 at 20:16 UTC

    You've used literal barewords "a" and "b" as hash keys in your sort comparitor where you wanted to be using the variables $a and $b.

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

Re: Word Frequency counter
by toolic (Bishop) on Oct 02, 2008 at 20:23 UTC
    Probably unrelated to your problem, but your outer while loop will be infinite if your @thisfile array has any contents. It would be better to use a for loop instead:
    for (@thisfile) {
      Actually, this is related. Anonymous' regex is trying to match against $_, but while loops (and everything else in this code) don't set $_, so it's undefined. If you change the outer while to a for then you have something that sets $_.

      Update: Complete gibberish fixed. Now it says what I meant to say

      --DrWhy

      "If God had meant for us to think for ourselves he would have given us brains. Oh, wait..."

Re: Word Frequency counter
by apl (Monsignor) on Oct 02, 2008 at 20:34 UTC
    ... and, as always, you should use strict; use warnings;
Re: Word Frequency counter
by planetscape (Chancellor) on Oct 03, 2008 at 12:10 UTC
Re: Word Frequency counter
by Lawliet (Curate) on Oct 02, 2008 at 20:21 UTC

    Update: ~This is not the reply you are looking for~
    My original reply was removed due to embarrassment. (Read OP too quickly, whoops.)

    I'm so adjective, I verb nouns!

    chomp; # nom nom nom