Ah. You've clarified what you mean a bit.
Ok, Here's a simple version that is limited to finding the top five (so it will fit across one standard terminal screen). Adjust $limit and redirect to a file for larger numbers.
Not necessarily the best way, but not too bad:
use warnings; use strict; $/ = ''; my $word = qr/(?<!\p{Alnum})\p{Alnum}+(?!\p{Alnum})/; my %count; my $paragraphs; my $counter; my @results; my $limit = 5; while ( my $line = <DATA> ) { while ( $line =~ /($word('$word)?)/g ) { $count{$1}{count}++; $count{$1}{$.}++; $paragraphs = $.; } } for ( sort { $count{$b}{count} <=> $count{$a}{count} || lc $a cmp lc $ +b } keys %count ) { last if ++$counter > $limit; push @results, $_; } print ' ' x 12; printf "|%12s", $_ for @results; print "\n"; print 'Total count:'; printf "|%12s", $count{$_}{count} for @results; print "\n"; print '-' x ( 13 * ( $limit + 1 ) ), "\n"; for my $line ( 1 .. $paragraphs ) { printf "Prgrph %4s:", $line; printf "|%12s", $count{$_}{$line} || '0' for @results; print "\n"; } __DATA__ "Hello World!" "Oh poor Yorick, his world I knew well yes I did" "don't won't, can't shouldn't, you'll, it's, etc." "Señor Montóya's resüme isn't ápropos." the, the, the, the, the, the, the, the, the, the "Hello World!" "Oh poor Yorick, his world I knew well yes I did" "don't won't, can't shouldn't, you'll, it's, etc." "Señor Montóya's resüme isn't ápropos." the, the, the, the, the, the, the, the, the, the "Hello World!" "Oh poor Yorick, his world knew well yes did" "don't won't, can't shouldn't, you'll, it's, etc." "Señor Montóya's resüme isn't ápropos." the, the, the, the, the, the, the, the, the, the
In reply to Re^2: a question about making a word frequency matrix
by thundergnat
in thread a question about making a word frequency matrix
by peacekorea
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |