in reply to reading dictionary file -> morphological analyser

salutations, jhourcle, we couldn't use the stemming technique because the language in question involves root changing and many irregular forms. BrowserUk, dividing the dictionary might be a good solution, but it would not be very convenient for the way we are planning to process the user input (since there would be many-word inputs). we implemented a solution by using Berkeley DB:
#!c:/perl/bin/perl.exe print "Content-type: text/html\n\n"; use BerkeleyDB; my $filename = "dict"; my $db = new BerkeleyDB::Hash -Filename => $filename, -Flags => DB_CREATE or die "Cannot open file $filename: $! $BerkeleyDB::Error\n";
getting the value by a given key is done by:
$db->db_get("word", $lang); #the word is stored in the $lang variable.
and works very fast. so, that DICTE filehandle code, instead of throwing values to a %dict hash, it throws values to a database by $db->db_put($english,$lang). this database creation has to be done only when the dictionary changes, but it's very slow. so, now the question is: is there a command-line Windows program that can convert flat-file databases do Berkeley DB databases? thank you in advance.

Replies are listed 'Best First'.
Re^2: reading dictionary file -> morphological analyser
by mr_mischief (Monsignor) on Jul 19, 2007 at 00:30 UTC
    Part of the utility of a database is that you can do updates to it, rather than regenerating it every time there's change. If your language changes, say from Spanish to Russian, then yes, you'd have to swap out the database. If you're working in one language, though, most of the words I would suspect stay the same, and you'll only be making additions and corrections over time.

    I would suggest learning to leverage the advantages using databases gives you. Whether you stick with Berkeley or use something SQL-centered, you should be looking to do updates to the database instead of regenerating it on a regular basis.