we already tested a code similar to this one above (besides checking the word, it also divides the "$irreg" variable into commas to check if the irregular forms of the word equal the user input). it also contains, besides the "if ($input eq $lang)" check, an engine that conjugates $lang in all tenses and moods of the language, to check if the input equals a conjugated form. thus, it calls a conjugating function, like this:open DICTE, "dict.txt"; if (length($input)>0){ print "<p><b>".$input."</b></p>"; while (<DICTE>){ chomp; ($english, $lang, $irreg, $clss) = split(/;/,$_); #gets the gr +ammatical informations stored in one line of the dictionary. if ($input eq $lang){ print "<p>$english - $lang, $clss</p>";} + #if the input equals the word in the dictionary, print it along with + its translation. }
great, so far. the problem with this code is that, as the possible conjugated forms get larger, and also if its necessary to check for prefixes or suffixes, the analysis takes very long if the dictionary is too big (specifically, 16008 words). so, we tried it in 2 ways: 1. check for every word in the dictionary, inflect it in all ways possible and compare it with the used input (the code displayed above); 2. the dictionary already contains all the conjugated and declined forms, thus analyser compares each line with the user input, with no need for declining/conjugating each word. the problem with the first one is that it gets too slow if the analyser itself has to conjugate each word to compare it, when it is an inflectional language with many inflected forms. the problem with the second one is that it gets too slow when the lexicon (root words plus inflected forms) is very big (like 385090 lines). we are concentrating on the second technique. it seems that the more information one line condensates, slower is the reading of each line. so, what do you preffer? many lines containing few information each, or less lines containing many information each? any thoughts on this? do you know any other way to make a faster analyser? thank you in advance, Paulo Marcos Durand Segal & Claudio Marcos Durand Segal.open DICTE, "dict.txt"; if (length($input)>0){ print "<p><b>".$input."</b></p>"; while (<DICTE>){ chomp; ($english, $lang, $irreg, $clss) = split(/;/,$_); #gets the gr +ammatical informations stored in one line of the dictionary. if ($input eq $lang){ print "<p>$english - $lang, $clss</p>";} + #if the input equals the word in the dictionary, print it along with + its translation. if (conj("$lang;present;1;singular") eq $lang){ print "<p>$eng +lish - $lang, $clss</p>";} #if the input equals a conjugated form, pr +int it, if (conj("$lang;present;2;singular") eq $lang){ print "<p>$eng +lish - $lang, $clss</p>";} #where conj("$word;$tense;$person;$number" +) is a function that conjugates the verb, given the specific informat +ions. # ... and in every tense, person and number. }
In reply to reading dictionary file -> morphological analyser by pc2
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |