in reply to stripped punctuation
Some things which often fail to get taken into account:
Words with internal apostrophes. (don't won't, can't shouldn't, you'll, it's, etc.)
Words with non ASCII characters. á, ñ, ÿ, etc.
This does:
########################################################## #! /usr/bin/perl use warnings; use strict; my $word = qr/(?<!\p{Alnum})\p{Alnum}+(?!\p{Alnum})/; my %count; while (<DATA>) { my $line = lc $_; while ($line =~ /($word('$word)?)/g){ $count{$1}++; } } printf "%15s %5d\n", $_, $count{$_} for sort keys %count; __DATA__ "Hello World!" "Oh poor Yorick, his world I knew well yes I did" "Words with internal apostrophes. (don't won't, can't shouldn't, you'l +l, it's, etc.)" "Señor Montóya's resüme isn't ápropos."
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: stripped punctuation
by thealienz1 (Pilgrim) on Oct 07, 2005 at 05:34 UTC |