Some things which often fail to get taken into account:
Words with internal apostrophes. (don't won't, can't shouldn't, you'll, it's, etc.)
Words with non ASCII characters. á, ñ, ÿ, etc.
This does:
########################################################## #! /usr/bin/perl use warnings; use strict; my $word = qr/(?<!\p{Alnum})\p{Alnum}+(?!\p{Alnum})/; my %count; while (<DATA>) { my $line = lc $_; while ($line =~ /($word('$word)?)/g){ $count{$1}++; } } printf "%15s %5d\n", $_, $count{$_} for sort keys %count; __DATA__ "Hello World!" "Oh poor Yorick, his world I knew well yes I did" "Words with internal apostrophes. (don't won't, can't shouldn't, you'l +l, it's, etc.)" "Señor Montóya's resüme isn't ápropos."
In reply to Re: stripped punctuation
by thundergnat
in thread stripped punctuation
by thealienz1
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |