Greetings all,
Many good comments I thought I might give this one a shot. Here is the methodology I would try.
- Create a hash that will be keyed by each of the words in your file the values will be a count of how many times each word (key) appears.
- Test that you successfully open your file.
- Once opened read the lines of the file one at a time with a while(<FILEHANDLE>){ #logic } loop.
- Lowercase all the characters in the line.
- With each line replace all the non-word characters with a single space (in case someone did not add a space after a period or between commas), this could be where you deal with your apostrophes as well.
- Split the line based on word boundaries (\b I think is the regex character)
- Go through the split list word by word if they are longer than four characters and already defined in the hash ++ the hash element keyed by the current word from your split list else add the key to the hash and initialize its value to one.
- Once all lines are done sort the hash based on the values. sort keys question is a good discussion on how you can do that.
- Print the top ten.
- Marvel at the power of perl.