The Moby Lexicon project, now concluded, has several different slices of the dictionary; it found the most common words in a couple of different samples, and ranked them by prevalence. This kind of data is very useful for certain search analysis: rank a match which hits a less-common word higher than a match on mundane words. I was doing some work on protocol compression and canonical word numbering as well. The Moby Lexicon can be found with Google, and has other goodies like parts-of-speech, hyphenation, common person names by gender, and a few studies of other languages.