in reply to How do I count the frequency of words in a file and save them for later?

I usually use this script and pipe the results to a text file:

#!/usr/local/bin/perl # $Id: wordfreq.perl,v 1.13 2001/05/16 23:46:40 doug Exp $ # http://www.bagley.org/~doug/shootout/ <= old URL; dead now # http://dada.perl.it/shootout/wordfreq.perl.html <= URL as of time th +is post was written # Tony Bowden suggested using tr versus lc and split(/[^a-z]/) use strict; my %count = (); while (read(STDIN, $_, 4095) and $_ .= <STDIN>) { tr/A-Za-z/ /cs; ++$count{$_} foreach split(' ', lc $_); } my @lines = (); my ($w, $c); push(@lines, sprintf("%7d\t%s\n", $c, $w)) while (($w, $c) = each(%cou +nt)); print sort { $b cmp $a } @lines;

planetscape

  • Comment on Re: How do I count the frequency of words in a file and save them for later?
  • Download Code

Replies are listed 'Best First'.
Re: Answer: How do I count the frequency of words in a file and save them for later?
by TedPride (Priest) on May 26, 2005 at 06:13 UTC
    There have been - and probably will be - quite a few posts regarding word counts. This solution doesn't work. To give just one example, "can't" ends up as 1 "can" and 1 "t". Other solutions often have it as "cant", but what is really needed is testing to see if the apostrophe has at least one letter on each side. Also, what about end of line word splits? A word like:

    google-
    plex

    Should be converted to googleplex before counting. I imagine there are one or two other things to program in as well.

    I'm not saying this is necessarily a bad place to start, but you need to program in some modifications. Better get cracking.