in reply to Re: Re: Common Words, Perl Keywords
in thread Common Words, Perl Keywords

The Moby Lexicon project, now concluded, has several different slices of the dictionary; it found the most common words in a couple of different samples, and ranked them by prevalence. This kind of data is very useful for certain search analysis: rank a match which hits a less-common word higher than a match on mundane words. I was doing some work on protocol compression and canonical word numbering as well. The Moby Lexicon can be found with Google, and has other goodies like parts-of-speech, hyphenation, common person names by gender, and a few studies of other languages.

--
[ e d @ h a l l e y . c c ]

  • Comment on Re: Re: Re: Common Words, Perl Keywords