converting from some database format do Berkeley DB.

pc2 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: converting from some database format do Berkeley DB. by jZed (Prior) on Jul 19, 2007 at 00:37 UTC
The module DBD::DBM supports BerkeleyDB. That means that you can use DBI to select from any database source (including CSV and MySQL) and insert into a BerkeleyDB database. You could also just use DBI's selectall_hashref() to load the data source into a hash and then tie that to your BerkeleyDB.	[reply]
Re^2: converting from some database format do Berkeley DB. by pc2 (Beadle) on Jul 19, 2007 at 12:05 UTC
salutations, thank you for the response, jZed. we installed the DBD:CSV module via ActivePerl's package manager, then we wrote the following DBI:CSV code, which uses ";" as a text delimiter: `use DBI; $dbh = DBI->connect(qq{DBI:CSV:csv_sep_char=\\;}); $dbh->{'csv_tables'}->{'dict'} = { 'file' => 'dict.txt'}; $sth = $dbh->prepare("SELECT * FROM dict"); $sth->execute() or die "Cannot execute: " . $sth->errstr();` [download] how can we proceed to transform the table "dict" (which is the "dict.txt" file) into a BerkeleyDB file? is some other module necessary? note: the $dbh->prepare() and $sth->execute() lines in this code seem to be too slow; may that be because the dict file is too big (385090 lines)? thank you in advance.	[reply] [d/l]
Re^3: converting from some database format do Berkeley DB. by jZed (Prior) on Jul 19, 2007 at 16:04 UTC
Hi. A little detective work lead me to your previous posting which supplies some context for your question here. Yes, DBD::CSV will be slow with a file as large as the one you are dealing with. The fastest way to get your data from a CSV file into a quickly-searchable form is to use the loading mechanism of a database (for example LOAD INFILE with MySQL). Using a database would also simplify and speed up future updates and searches. If you absolutely must use BerekeleyDB instead of a database, then you can convert from CSV to BerekeleyDB with something like this: #!/usr/bin/perl use warnings; use strict; use BerkeleyDB; my( $csv_file, $berk_file) = qw( dict.txt dict ); my $db = BerkeleyDB::Hash->new( -Filename => $berk_file, -Flags => DB_CREATE ) or die "Cannot open file '$berk_file': $! $BerkeleyDB::Error\n"; open( DICTE, $csv_file ) or die "Cannot open file '$csv_file': $!\n"; for (<DICTE>) { chomp; my($key, $value) = split(/;/,$_,2); $db->db_put($key,$value); } # the file "dict" is now a BerkeleyDB file with entire # contents of the CSV file "dict.txt" [download]	[reply] [d/l]
Re: converting from some database format do Berkeley DB. by pc2 (Beadle) on Jul 20, 2007 at 23:40 UTC
salutations, we have already tried this technique of passing the CSV data to a BerkeleyDB database by $db->db_put(), but it gets too slow. anyway, we found a very good way to convert from TXT to BerkeleyDB: we convert it to Berkeley DB by using the command "db-load" at command-line. for this, besides the BerkeleyDB module of Perl, we also installed the Windows installer of BerkeleyDB at http://www.oracle.com/technology/software/products/berkeley-db/index.html. then, we discovered that the command (in the command-line) `db_load -c duplicates=1 -T -t hash -f dict.txt dict.db` [download] converts "dict.txt" (with keys and values separated by a newline character, and each pair of lines being a record) to the BerkeleyDB database "dict.db", allowing duplicate keys. this solution turned out to work great. thank you for all the help.	[reply] [d/l]