Here are two basic regexps. One that's inclusive:
$word = qr/ [[:alpha:]] # Start with a letter. (?: [:^space:]* # Hyphens, apostrophes, etc [[:alpha:]] # Don't end on a punctuation mark. )? # Catch single letter words. /x;
One that's restrictive:
$word = qr/ [[:alpha:]] # Start with a letter. (?: [[:alpha:]'-]+ # Allowed characters. [:alpha:] # Don't end on a punctuation mark. )? # Catch single letter words. /x;
Here's how you use them:
my $last1; my $last2; while ($content =~ /($word)/g) { my $word = $1; ++$hash{ $word }; ++$hash{ "$last1 $word"} if defined $last1; ++$hash{"$last2 $last1 $word"} if defined $last2; $last2 = $last1; $last1 = $word; }
Update: Instead of just returning the data, I've updated my code to actually process it.
In reply to Re: Word density
by ikegami
in thread Word density
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |