I wrote a set of scripts that will automatically find
rare words in a book or text.
1. The first script will FTP a very large number of ascii
coded classic books from the gutenberg project
(www.gutenberg.org).
2. The second one computes a histogram of word frequencies
for all those books.
3. The third one takes the text where one wants to find rare
words. It will start by showing all the words in it with
count 0 in the histogram, then the ones with count 1 and
so on. The user chooses manually which words he wants
to include in the glossary and then chooses to stop as
the scripts starts showing words with higher counts.
4. The chosen words are looked up automatically on web
dictionary.
5. We have our glossary ready! The next step is
unimplemented but what follows is to generate a TeX file
for type-setting the ascii book with the dictionary terms
as footnotes or as a glossary on the back.