...so, what do you preffer? many lines containing few information each, or less lines containing many information each?

To be able to give any useful advice, it would help to know the exact context that this program is going to be used in. For example, are you intending to call it once for every input item? How many input items are you planning to look up? Etc.

Also, as to speed, it's hard to make a precise prediction without knowing all the details (such as hardware being used, complexity of the conjugations, etc.). However, my gut feeling is that precomputing everything will be marginally faster, in particular if you need to compare more than one input item against the full set of words. Generally, if your machine has fast I/O, slurping in the whole precomputed dictionary might not be a problem, while, if you have a fast processor, computing the conjugations / inflections on-the-fly might still be faster after all...   If I were you, I would just benchmark the different approaches, and go with whatever turns out to be faster.

In case you need to do many lookups, it's probably a good idea to use a hash to store all the words (as keys).  If the words aren't unique, i.e. if conjugations of different base words may lead to the same final form (and if you don't want to lose the info where the final form originated from), you might want to accumulate that info in the hash elements' values... In addition to using a hash, it might help to read in the whole data structure once, and keep the process running persistently. I.e., you might consider using some client/server architecture.

Generally, the whole thing very much sounds like the problem a typical spell/grammar checker has to solve (if I understood your task correctly, that is). I'm personally no expert in such algorithms, but I'm sure many people have put much thought into optimising these things... In other words, that's where I would start looking for publications, etc. (e.g. scientific ones).   Good luck! And welcome here, BTW.


In reply to Re: reading dictionary file -> morphological analyser by almut
in thread reading dictionary file -> morphological analyser by pc2

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.