in reply to reading dictionary file -> morphological analyser
...so, what do you preffer? many lines containing few information each, or less lines containing many information each?
To be able to give any useful advice, it would help to know the exact context that this program is going to be used in. For example, are you intending to call it once for every input item? How many input items are you planning to look up? Etc.
Also, as to speed, it's hard to make a precise prediction without knowing all the details (such as hardware being used, complexity of the conjugations, etc.). However, my gut feeling is that precomputing everything will be marginally faster, in particular if you need to compare more than one input item against the full set of words. Generally, if your machine has fast I/O, slurping in the whole precomputed dictionary might not be a problem, while, if you have a fast processor, computing the conjugations / inflections on-the-fly might still be faster after all... If I were you, I would just benchmark the different approaches, and go with whatever turns out to be faster.
In case you need to do many lookups, it's probably a good idea to use a hash to store all the words (as keys). If the words aren't unique, i.e. if conjugations of different base words may lead to the same final form (and if you don't want to lose the info where the final form originated from), you might want to accumulate that info in the hash elements' values... In addition to using a hash, it might help to read in the whole data structure once, and keep the process running persistently. I.e., you might consider using some client/server architecture.
Generally, the whole thing very much sounds like the problem a typical spell/grammar checker has to solve (if I understood your task correctly, that is). I'm personally no expert in such algorithms, but I'm sure many people have put much thought into optimising these things... In other words, that's where I would start looking for publications, etc. (e.g. scientific ones). Good luck! And welcome here, BTW.
|
|---|