Hello, monks:
I am quite new to BerkeleyDB and hope to get some guidelines from experienced developers before digging into more BerkeleyDB details for its better performance.
So far I've built several BerkeleyDB files(size from 20MB to 1.5GB in both BerkeleyDB::Hash and BerkeleyDB::Btree) and as of this writting, they are all working well.
Most of my BerkeleyDB files are imported like the following sample code:
#!/usr/bin/perl
use strict;
use warnings;
use BerkeleyDB;
my $db_file = '/path/to/lib/myapp.db';
unlink $db_file if -f $db_file;
my $bdb = tie my %tree, 'BerkeleyDB::Btree',
-Filename => $db_file,
-Flags => DB_CREATE,
or die $!;
my raw_file = '/path/to/raw.dat';
open my $fin, $raw_file
or die "can not open $raw_file for reading: $!";
while(<$fin>) {
my ($key, $value) = split/\t/;
next if not fit_condition($key);
$bdb->db_put($key, $value);
}
sub fit_condition
{
#.skip.#
}
For an input with about 50M key-value pairs, the above code took about 10 hours to finish building the DB file (about 17M key-value pairs, 1.1GB in file-size). My questions are:
- are there some BerkeleyDB::ENV parameters that I can use to speed the importing of raw data into BerkeleyDB?
- How to set the Cachesize in BerkeleyDB::ENV, given the RAM is about 16GB? how does the cache-size influence the read/write differently in BerkeleyDB? what is the proper RAM percentage to consider for Caching-purpose(in BerkeleyDB or MySQL on a DB server) if there is any:)??
- How to stringify the value part if I want to save a complex data strcture instead of a plain string as the value? JSON?? or any better ways?
- what is the most important factor to select between BerkeleyDB::Btree and BerkeleyDB::Hash? what else considerations from the application levels other than the algorithm levels?
- Is it possible to set the BerkeleyDB read-only in some applications, so that I don't need to worry about any accidental modification on the DB data. And in case of such accidental mis-operations, how to recover the DB file besides backing up DB regularly?
- Can I use the same BerkeleyDB database file generated by Perl in PHP code?
Other informtaion:
- $BerkeleyDB::db_version => 4.3
- $BerkeleyDB::VERSION => 0.34
Thank you for any helpful suggestions or links.
lihao