SkullOne has asked for the wisdom of the Perl Monks concerning the following question:

I have what seems to be a relatively simple problem on my hands, hoping a fellow monk can enlighten me.
I have a file with thousands of entries, lots of duplicates, in seemingly random order and numbers. I want to report on the number of times it finds something.
Example, if 'blah' occurs in the file 8 times, I would like to see printed "blah = 8"
Its just a 1 columned file, but I want all found entries to be reported
Foo, bar, none, with a matching number to the times it occured.
I've tried making arrays, and count++ each time, but its doesn't appear to be doing it per item, just on each line

Does anyone have an idea of how to accomplish this?

Replies are listed 'Best First'.
Re: Reporting entries in a file
by psini (Deacon) on May 30, 2008 at 22:02 UTC

    If I understand rigth you want to count the number of occurrencies of every distinct "word" in your file.

    To do so you should extract the "words" from the file and use a hash (not an array) to count the occurrencies. At the beginning the hash is w empty; for every word you extract, you check if defined($hash{$word}): if it is defined then you increment the value, else you put the value to one.

    Careful with that hash Eugene.

      Thank you, a hash is exactly what I needed, not an array.
      I saw an example online, and modified it a tad:
      #!/usr/bin/perl while (<>) { @words = split(/\n+/); foreach $word (@words) { $count{$word}++; } } foreach $word (sort by_count keys %count) { print "$word \: $count{$word}\n"; } sub by_count { $count{$b} <=> $count{$a}; }
        @words = split(/\n+/);

        I guess you meant: @words = split(/\s+/);

        Correction, the hash would've worked fine, but ended up using an array again.

      From what SkullOne said ("It's just a one-column file"), s/he had only one word per line, so the task is even easier (we need only to count the different types of line - maybe after stripping leading and trailing spaces). The concept of using a hash would be the same, but we don't need to split the line into words.

      A solution which would do with arrays instead of hashes and is easy to code too, provided that we don't need stripping of the spaces, would be to slurp the whole file into a array, sort the array, and then (in a loop through the array), count the number of consecutive equal entries. This would get us an alphabetically sorted list of the words with associated count.

      -- 
      Ronald Fischer <ynnor@mm.st>
Re: Reporting entries in a file (unix one-liner)
by toolic (Bishop) on May 31, 2008 at 01:49 UTC
    The quick *nix solution use the sort and uniq utilities to produce a numerically sorted list of word counts:
    sort file | uniq -c | sort -n

    will print out something like:

    1 foo 3 bar 8 blah
      If you are not in a *nix environment. This is a perl only oneliner.
      perl -lane '$h{$_}++ for @F; END{print qq{$h{$_} $_\n} for sort keys % +h}' file
      Switch to double quotes in Windows.

      print+qq(\L@{[ref\&@]}@{['@'x7^'!#2/"!4']});