Wise monks
I am trying to put a script together that will go through a text file and show me the top 5 words sorted my occurrence. Here is where I am at:
while (<>) { chop; my @words = split; foreach $wd (@words) { next if length($wd) < 5; $count{$wd}++; } } foreach $w (keys %count) { print "$count{$w} $w\n"; }
As you can see, I am only interested in words over 5 characters in length. I am unsure how to go about sorting this by number of occurrence. Also, I am having problems with punctuation showing up in my results. I have an example below:
2 usurpations, 2 purpose 1 people. 1 obstructed 1 formidable 1 obstructing 1 uncomfortable,
Is this because I am using split? Is there a better way to go about this. I am sure I will start missing words that have apostrophes too. Also, how should I sort this? Then how can I take only the top 5?
Any help or advice is appreciated.
Many thanks,
ghettofinger
In reply to Top five words by occurrence by ghettofinger
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |