in reply to reading dictionary file -> morphological analyser

salutations, thank you for the responses. based on the responses we got, we tried it several ways: we tried to create a Microsoft Access database and access it via Win32::OLE, but it didn't seem to be convenient (because we would have to convert the txt to mdb always when the dictionary had some alteration), and we also tried other ways, like putting the line of the dict file on the hash only when it matches the user input (but it wouldn't very convenient for multiple word inputs, which we plan to implement). so, we concentrated on the hash approach, which seemed to be the best one, for returning a hash value of a certain key is very fast, even for a file of a very big line number. the only problem was generating the hash from the dict file, line by line. so, we found the "Storable" library (is this name correct?). thus, the hash file would be generated only once, stored in the file by the store() function, and then retrieved from the file:
#!c:\perl\bin\perl use Storable; #use this for calling the hash storing functions. my %dict; # the hash while (my $line = <DATA>) { chomp $line; $dict{$line}++; # instead of ++ you could also assign some value. +.. } store(\%dict, "hash.txt"); #store the hash in the file. %dict = %{retrieve("hash.txt")}; #retrieve the hash from the file "has +h.txt". thus, it only needs to be generated once. my @inputs = qw( foo fooed fooen prefoo postfoo ); for my $input (@inputs) { print "found '$input' in lexicon\n" if exists $dict{$input}; }
by using the very fast hash, retrieving it from a file previously generated, it works very faster. so, again, thank you for the responses, they were very helpful for finding a good approach for our problem. note: we are not professional linguists nor professional programmers (we can't work yet, anyway, because of our age), we do this just because we like (the programming part and the lingustic part). by the way, we are from Brazil, and we are twins (it explains the "we"'s). salutations.

Replies are listed 'Best First'.
Re^2: reading dictionary file -> morphological analyser
by pc2 (Beadle) on Jul 18, 2007 at 11:45 UTC
    just for informing: this message above by "Anonymous Monk" is ours, we just had forgotten to log in when we sent it.