in reply to Word density
Here are two basic regexps. One that's inclusive:
$word = qr/ [[:alpha:]] # Start with a letter. (?: [:^space:]* # Hyphens, apostrophes, etc [[:alpha:]] # Don't end on a punctuation mark. )? # Catch single letter words. /x;
One that's restrictive:
$word = qr/ [[:alpha:]] # Start with a letter. (?: [[:alpha:]'-]+ # Allowed characters. [:alpha:] # Don't end on a punctuation mark. )? # Catch single letter words. /x;
Here's how you use them:
my $last1; my $last2; while ($content =~ /($word)/g) { my $word = $1; ++$hash{ $word }; ++$hash{ "$last1 $word"} if defined $last1; ++$hash{"$last2 $last1 $word"} if defined $last2; $last2 = $last1; $last1 = $word; }
Update: Instead of just returning the data, I've updated my code to actually process it.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Word density
by Anonymous Monk on Mar 19, 2006 at 19:20 UTC | |
by ikegami (Patriarch) on Mar 19, 2006 at 19:25 UTC | |
by Anonymous Monk on Mar 19, 2006 at 19:29 UTC | |
by sulfericacid (Deacon) on Mar 19, 2006 at 19:36 UTC | |
by ikegami (Patriarch) on Mar 21, 2006 at 01:26 UTC |