I wrote a set of scripts that will automatically find rare words in a book or text. 1. The first script will FTP a very large number of ascii coded classic books from the gutenberg project (www.gutenberg.org). 2. The second one computes a histogram of word frequencies for all those books. 3. The third one takes the text where one wants to find rare words. It will start by showing all the words in it with count 0 in the histogram, then the ones with count 1 and so on. The user chooses manually which words he wants to include in the glossary and then chooses to stop as the scripts starts showing words with higher counts. 4. The chosen words are looked up automatically on web dictionary. 5. We have our glossary ready! The next step is unimplemented but what follows is to generate a TeX file for type-setting the ascii book with the dictionary terms as footnotes or as a glossary on the back.

In reply to Glossary maker by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.