in reply to a question about making a word frequency matrix
Are you looking for something like this?
use warnings; use strict; my $word = qr/(?<!\p{Alnum})\p{Alnum}+(?!\p{Alnum})/; my %count; my $counter; while (my $line = <DATA>) { while ($line =~ /($word('$word)?)/g){ $count{$1}++; } } for (sort {$count{$b} <=> $count{$a} || lc $a cmp lc $b } keys %count +) { printf "%15s %5d\n", $_, $count{$_}; last if ++$counter > 100; }; __DATA__ "Hello World!" "Oh poor Yorick, his world I knew well yes I did" "don't won't, can't shouldn't, you'll, it's, etc." "Señor Montóya's resüme isn't ápropos." the, the, the, the, the, the, the, the, the, the
It isn't very clear what you mean by "words-by-words matrix".
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: a question about making a word frequency matrix
by thundergnat (Deacon) on Dec 07, 2005 at 20:17 UTC | |
|
Re^2: a question about making a word frequency matrix
by peacekorea (Novice) on Dec 07, 2005 at 19:45 UTC | |
|
Re^2: a question about making a word frequency matrix
by ambrus (Abbot) on Dec 07, 2005 at 23:07 UTC | |
by thundergnat (Deacon) on Dec 08, 2005 at 03:54 UTC | |
by ambrus (Abbot) on Dec 08, 2005 at 17:03 UTC |