After many long hours and lots of trial and error, I think that I have mastered the art of concurrent access in BerkeleyDB. Hoo-ha!

I am using a BDB v4.2 database, Perl 5.8.2 and BerkeleyDB.pm v0.25. My test script simultaneously forks multiple children (determined by $loops) which go forth and perform multiple inserts and selects.

For the better part of yesterday, I have been trying to get this script to run without locking up or returning error messages. I tried opening the database with the DB_INIT_TXN flag, the DB_INIT_CDB/DB_INIT_MPOOL flags and with the DB_INIT_LOCK flag. I do not think locks are implemented in the BerkeleyDB.pm module yet so this flag is probably useless.

No matter what combination or order I seemed to use, I would either see the inserts get hung or receive a bad status message. After much reading of PM, Sleepycat's documentation, Ruby BDB documentation, source code, and tests, I have finally backed into a solution which so far has successfully been able to fire off 19 simultaneous children inserting 5000 records each. I cannot imagine my intranet application will ever stress these limits!

Comments on my testing logic or use of BerkeleyDB much appreciated.

#!/usr/bin/perl use BerkeleyDB; use Benchmark qw(timethese); use strict; use warnings; use IO::Handle; our $iter = 5_000; our $loops = 5; our @pids; our $txt = 'a' x 100; unlink("bdb4.hash.test"); for (my $i = 0; $i <= $loops; $i++) { if (my $pid = fork) { push @pids, $pid; } else { # Child process die "cannot fork: $!" unless defined $pid; print "\n", "-" x 25, "\n| Iteration #$i\n", "-" x 25, "\n"; # do some setup my $id_BDB4_hash; my $bdb4_hash; my $env; my %hash; sub insertBDB4_hash { warn "Inserting $id_BDB4_hash...\n"; my $status = $bdb4_hash->db_put($id_BDB4_hash++, $txt); die "Bad exit status: $status" if $status; } sub selectBDB4_hash { my $v; my $status = $bdb4_hash->db_get($id_BDB4_hash++, $v); warn "Bad exit status: $status" if $status; } # Set environment my $dbhome = "."; my $dberr = "err"; $env = BerkeleyDB::Env->new( -Home => $dbhome, -ErrFile => $dberr, -Flags => (DB_CREATE | DB_INIT_CDB | DB_INIT_MPOOL), ); if (!$env) { die "could not create env: $! '$BerkeleyDB::Error +'.\n"; } # Open tied database $bdb4_hash = tie %hash, "BerkeleyDB::Hash", -Filename => "bdb4.hash_t.test", -Flags => DB_CREATE, -Env => $env, or die "Cannot open file bdb4.hash_t.test (iter #$i): $! $ +BerkeleyDB::Error\n" ; $id_BDB4_hash = ($i * $iter); my $r = timethese( $iter - 1, { 'I BerkeleyDB::Hash' => \&insertBDB4_hash, }, ); $id_BDB4_hash = ($i * $iter); $r = timethese( $iter - 1, { 'S BerkeleyDB::Hash' => \&selectBDB4_hash, }, ); exit; } } # wait for the children to finish foreach my $childpid (@pids) { waitpid($childpid, 0); } # Verify integrity of files when all work is complete print "All finished!\n\n"; my $count = $iter * ($loops + 1) - 1; my $actual; my $last_row; # Print the contents of the file tie my %bdb4_hash, "BerkeleyDB::Hash", -Filename => "bdb4.hash_t.test", -Flags => DB_CREATE or die "Cannot open file bdb4.hash_t.test: $! $BerkeleyDB::Error\n +" ; #foreach my $k (sort {$a <=> $b} keys %bdb4_hash) { # print "$k -> $bdb4_hash{$k}\n"; #} $actual = scalar keys %bdb4_hash; $last_row = $count-1; print "BDB4_Hash_t rows inserted (actual/attempted) = $actual / $count +\n"; print <<EOF; Results of last row ($last_row): BerkeleyDB::Hash (T): $bdb4_hash{ $last_row } EOF

-Wm

janitored by ybiC: Balanced <readmore> tags around long codeblock

Replies are listed 'Best First'.
Re: Testing DB concurrency with BerkeleyDB
by perrin (Chancellor) on Feb 20, 2004 at 18:34 UTC
    Looks fine. You ssem to be using the same flags that I use. Using the transactions and page-level locking is a mistake for most applications, since it requires you to handle deadlocks and makes everything more complex.

    Here's some sample code:

    my %Cache; my $env = new BerkeleyDB::Env( -Home => '/tmp', -Flags => DB_INIT_CDB | DB_CREATE | DB_INIT_MPOOL, ) or die "can't create BerkelyDB::Env: $!"; my $Obj = tie %Cache, 'BerkeleyDB::Btree', -Filename => '/tmpfs/bdbfile', -Flags => DB_CREATE, -Mode => 0640, -Env => $env or die ("Can't tie to /tmp/bdbdfile: $!");
    I use the db_get/db_put calls with this, and no explicit locking, and it scales very well.
Re: Testing DB concurrency with BerkeleyDB
by demerphq (Chancellor) on Feb 22, 2004 at 10:38 UTC

    Just on a style level you might want to rework the code so those subroutine declarations arent inside the loop. Or probably closer to your original intent, rewrite them as anonymous subs directly inside the timethese call.


    ---
    demerphq

      First they ignore you, then they laugh at you, then they fight you, then you win.
      -- Gandhi