in reply to Word density

Yet another way to do it. This one is written to stop on the end of sentences (sort of. ;) ). This way phrase don't count if they are accroos a !.? boundary. It probably would have been better to use some modules but it was easy enough to hack together as is. $stop_words should be an array ref with stop words in it.

sub process_keywords { my ($text, $weight, $stop_words, $phrase_length, $key_phrases) = @ +_; $text =~ s/&[a-z]+;*/ /g; $text =~ s/[",']//g; $text =~ s/[.!?]/ . /g; $text =~ s/\s+/ /gs; my @words = map { lc($_) } split(' ', $text); @words = grep { length($_) } @words; my $stops = { map { $_ => 1 } @$stop_words }; for my $word (0 .. scalar @words - 1) { next if exists $stops->{$words[$word]}; for my $length (0 .. $phrase_length) { next unless defined @words[$word + $length]; next if exists $stops->{$words[$word+$length]}; my $phrase = join(' ', @words[$word .. $word+$length]); $key_phrases->{$phrase} += $weight; } } return $key_phrases; }

___________
Eric Hodges