Yet another way to do it. This one is written to stop on the end of sentences (sort of. ;) ). This way phrase don't count if they are accroos a !.? boundary. It probably would have been better to use some modules but it was easy enough to hack together as is. $stop_words should be an array ref with stop words in it.

sub process_keywords { my ($text, $weight, $stop_words, $phrase_length, $key_phrases) = @ +_; $text =~ s/&[a-z]+;*/ /g; $text =~ s/[",']//g; $text =~ s/[.!?]/ . /g; $text =~ s/\s+/ /gs; my @words = map { lc($_) } split(' ', $text); @words = grep { length($_) } @words; my $stops = { map { $_ => 1 } @$stop_words }; for my $word (0 .. scalar @words - 1) { next if exists $stops->{$words[$word]}; for my $length (0 .. $phrase_length) { next unless defined @words[$word + $length]; next if exists $stops->{$words[$word+$length]}; my $phrase = join(' ', @words[$word .. $word+$length]); $key_phrases->{$phrase} += $weight; } } return $key_phrases; }

___________
Eric Hodges

In reply to Re: Word density by eric256
in thread Word density by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.