Yet another way to do it. This one is written to stop on the end of sentences (sort of. ;) ). This way phrase don't count if they are accroos a !.? boundary. It probably would have been better to use some modules but it was easy enough to hack together as is. $stop_words should be an array ref with stop words in it.
sub process_keywords { my ($text, $weight, $stop_words, $phrase_length, $key_phrases) = @ +_; $text =~ s/&[a-z]+;*/ /g; $text =~ s/[",']//g; $text =~ s/[.!?]/ . /g; $text =~ s/\s+/ /gs; my @words = map { lc($_) } split(' ', $text); @words = grep { length($_) } @words; my $stops = { map { $_ => 1 } @$stop_words }; for my $word (0 .. scalar @words - 1) { next if exists $stops->{$words[$word]}; for my $length (0 .. $phrase_length) { next unless defined @words[$word + $length]; next if exists $stops->{$words[$word+$length]}; my $phrase = join(' ', @words[$word .. $word+$length]); $key_phrases->{$phrase} += $weight; } } return $key_phrases; }
In reply to Re: Word density
by eric256
in thread Word density
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |