in reply to tutelage needed

Greetings all,
Many good comments I thought I might give this one a shot. Here is the methodology I would try.
  1. Create a hash that will be keyed by each of the words in your file the values will be a count of how many times each word (key) appears.
  2. Test that you successfully open your file.
  3. Once opened read the lines of the file one at a time with a while(<FILEHANDLE>){ #logic } loop.
  4. Lowercase all the characters in the line.
  5. With each line replace all the non-word characters with a single space (in case someone did not add a space after a period or between commas), this could be where you deal with your apostrophes as well.
  6. Split the line based on word boundaries (\b I think is the regex character)
  7. Go through the split list word by word if they are longer than four characters and already defined in the hash ++ the hash element keyed by the current word from your split list else add the key to the hash and initialize its value to one.
  8. Once all lines are done sort the hash based on the values. sort keys question is a good discussion on how you can do that.
  9. Print the top ten.
  10. Marvel at the power of perl.

Replies are listed 'Best First'.
Re: Re: tutelage needed
by ctp (Beadle) on Jan 04, 2004 at 06:07 UTC
    Awesome stuff, and much help and idea fodder. I will try some of them out as soon as I can. I followed the sort keys link just now and made use of some info there. Thanks!