Here's my version of your code
#!/usr/bin/perl -w # Find most frequent signals amidst political noise use strict; $|=1; my (@noise,%stop,%count,$word,$total,$percent); @noise=qw/ a about after all also among an and any are as at be been before being + between both but by can could during each every for from further had has have +his i if in into is it its made many may more most must my not of on only or ot +her our over own shall should since so some still such than that the their the +m there these they this those through to under upon us was we well were what w +hen where which while who will with within without would you your /; foreach(@noise) { $stop{$_}=1;} # single file of State of Union addresses open(IN,"<soufile.txt"); foreach(<IN>){ chomp; s/&(.*?);//g; s/\s+/ /g; s/[^A-Za-z ]//g; s/^ | $//g; foreach(split(/ /,$_)){ $word=lc($_); $word=~s/[^A-Za-z]//g; next if !$word||$stop{$word}; $total++; $count{$word}++; } } foreach(sort {$count{$b}<=>$count{$a}} keys %count){ $percent=int(10000 * $count{$_} / $total)/100; print "$count{$_} :$_ ($percent \%)\n"; }
For George W. Bush's last address it would give (All words >0.5%):
76 :applause (3.46 %) 33 :america (1.5 %) 19 :security (0.86 %) 17 :world (0.77 %) 15 :american (0.68 %) 14 :terror (0.63 %) 13 :good (0.59 %) 13 :new (0.59 %) 12 :people (0.54 %) 12 :weapons (0.54 %) 12 :war (0.54 %)

In reply to Re: Conversation Pools by osama
in thread Conversation Pools by astrobio

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.