the problem with this code is that, as the possible conjugated forms get larger, and also if its necessary to check for prefixes or suffixes, the analysis takes very long if the dictionary is too big (specifically, 16008 words).

You may wish to look into a concept called 'stemming'. Basically, the program reads in terms, it removes prefixes, suffixes, numeric quantifiers, etc, to get just the 'stem' or base of the word. You then compare just the base word, rather than all of the possible permutations.

This may also reduce the number of terms that you're tracking in your dictionary. As you're trying to be language agnostic, this may be more difficult, as the rules for stemming will be dependant upon the language. Also, English is a notoriously difficult language to do this in, as its terms come from multiple languages, and tend to keep the rules for the original language.


In reply to Re: reading dictionary file -> morphological analyser by jhourcle
in thread reading dictionary file -> morphological analyser by pc2

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.