in reply to reading dictionary file -> morphological analyser
the problem with this code is that, as the possible conjugated forms get larger, and also if its necessary to check for prefixes or suffixes, the analysis takes very long if the dictionary is too big (specifically, 16008 words).
You may wish to look into a concept called 'stemming'. Basically, the program reads in terms, it removes prefixes, suffixes, numeric quantifiers, etc, to get just the 'stem' or base of the word. You then compare just the base word, rather than all of the possible permutations.
This may also reduce the number of terms that you're tracking in your dictionary. As you're trying to be language agnostic, this may be more difficult, as the rules for stemming will be dependant upon the language. Also, English is a notoriously difficult language to do this in, as its terms come from multiple languages, and tend to keep the rules for the original language.
|
|---|