I wrote a set of scripts that will automatically find rare words in a book or text. 1. The first script will FTP a very large number of ascii coded classic books from the gutenberg project (www.gutenberg.org). 2. The second one computes a histogram of word frequencies for all those books. 3. The third one takes the text where one wants to find rare words. It will start by showing all the words in it with count 0 in the histogram, then the ones with count 1 and so on. The user chooses manually which words he wants to include in the glossary and then chooses to stop as the scripts starts showing words with higher counts. 4. The chosen words are looked up automatically on web dictionary. 5. We have our glossary ready! The next step is unimplemented but what follows is to generate a TeX file for type-setting the ascii book with the dictionary terms as footnotes or as a glossary on the back.

Replies are listed 'Best First'.
RE: Glossary maker
by vroom (His Eminence) on Apr 26, 2000 at 03:08 UTC
RE: Glossary maker
by gregorovius (Friar) on Apr 27, 2000 at 00:46 UTC
    Not an anonymous monk any more! I chose to be called gregorovius. I just posted the code on the Catacombs, but it somehow deletes all the newlines on it, even though when I edit it I do see the newlines. I don't know if it has to do with my clunky netscape 4.07. Has anybody had this problem?
RE: Glossary maker
by buzzcutbuddha (Chaplain) on Apr 26, 2000 at 16:11 UTC
    I too would like to see those scripts...
RE: Glossary maker
by Simplicus (Monk) on Apr 26, 2000 at 17:36 UTC
    me too, very much.
    Simplicus
RE: Glossary maker
by Keighvin (Novice) on Apr 26, 2000 at 20:30 UTC
    We need more registered monks to get credit for this kind of excellent craft.