in reply to SQLite vs CDB_File vs BerkeleyDB
Well as I stated earlier I had serious reservations about the validity of your benchmark. When I messaged you about it you replied "The point was to use all the modules' simplest interfaces". This is a specious argument. There is no difference between the interface for a DB_HASH or a DB_BTREE. The difference is underneath at the implementation level. And the difference is huge.
I reran your benchmark without CDB_File, it wont build cleanly on W32, and produced the following results which incidentally do not completely support my earlier coments ( the file size differences are not what I said they would be.) However they do show that your comparison of DB_HASH against DBD::SQLite is misleading (SQLite gets _creamed_ by a $DB_TREE), and they also show that if you are looking for high speed with a BerkeleyDB then you should definately go with a $DB_BTREE instead of a $DB_HASH.
My results were as follows:
Benchmark: timing 30000 iterations of
I B+Tree, I B+Tree (A), I Hash, I Hash (A), I SQLite, I SQLite 2...
I B+Tree: 1 wallclock secs ( 0.88 usr + 0.11 sys = 0.98 CPU) @ 30487.80/s (n=30000)
I B+Tree (A): 1 wallclock secs ( 0.95 usr + 0.02 sys = 0.97 CPU) @ 30959.75/s (n=30000)
I Hash: 8 wallclock secs ( 2.25 usr + 4.38 sys = 6.63 CPU) @ 4528.30/s (n=30000)
I Hash (A): 7 wallclock secs ( 2.31 usr + 4.58 sys = 6.89 CPU) @ 4353.50/s (n=30000)
I SQLite: 76 wallclock secs (14.84 usr + 49.66 sys = 64.50 CPU) @ 465.12/s (n=30000)
I SQLite 2: 4 wallclock secs ( 3.30 usr + 0.09 sys = 3.39 CPU) @ 8849.56/s (n=30000)
Rate I SQLite I Hash (A) I Hash I SQLite 2 I B+Tree I B+Tree (A)
I SQLite 465/s -- -89% -90% -95% -98% -98%
I Hash (A) 4354/s 836% -- -4% -51% -86% -86%
I Hash 4528/s 874% 4% -- -49% -85% -85%
I SQLite 2 8850/s 1803% 103% 95% -- -71% -71%
I B+Tree 30488/s 6455% 600% 573% 245% -- -2%
I B+Tree (A) 30960/s 6556% 611% 584% 250% 2% --
Benchmark: timing 30000 iterations of
S B+Tree, S B+Tree(A), S Hash, S Hash(A), S SQLite, S SQLite 2...
S B+Tree: 1 wallclock secs ( 0.77 usr + 0.09 sys = 0.86 CPU) @ 34965.03/s (n=30000)
S B+Tree(A): 1 wallclock secs ( 0.95 usr + 0.05 sys = 1.00 CPU) @ 30000.00/s (n=30000)
S Hash: 2 wallclock secs ( 1.03 usr + 1.63 sys = 2.66 CPU) @ 11295.18/s (n=30000)
S Hash(A): 3 wallclock secs ( 1.47 usr + 1.36 sys = 2.83 CPU) @ 10608.20/s (n=30000)
S SQLite: 11 wallclock secs ( 6.80 usr + 3.69 sys = 10.49 CPU) @ 2861.23/s (n=30000)
S SQLite 2: 4 wallclock secs ( 3.89 usr + 0.09 sys = 3.98 CPU) @ 7530.12/s (n=30000)
Rate S SQLite S SQLite 2 S Hash(A) S Hash S B+Tree(A) S B+Tree
S SQLite 2861/s -- -62% -73% -75% -90% -92%
S SQLite 2 7530/s 163% -- -29% -33% -75% -78%
S Hash(A) 10608/s 271% 41% -- -6% -65% -70%
S Hash 11295/s 295% 50% 6% -- -62% -68%
S B+Tree(A) 30000/s 948% 298% 183% 166% -- -14%
S B+Tree 34965/s 1122% 364% 230% 210% 17% --
Resulting File Sizes:
bdb.hash.test :5226496
bdb.btree.test :6701056
bdb.hash.a.test :5095424
bdb.btree.a.test :3579904
sqlite.test :3845120
sqlite2.test :2043904
And the code I used to produce this benchmark is
As Matts said, please no comments about the benchmarking code, I know its could be a lot nicer. (Its only a benchmark... :-)use DB_File; use DBI; use Benchmark qw(cmpthese); use strict; use warnings; # do some setup my $id_BDB_hash = 0; my $id_BDB_hash_a = 0; my $id_BDB_btree_a = 0; my $id_BDB_btree = 0; my $id_lit2 = 0; my $id_lit = 0; my $txt = 'a' x 100; unlink( "bdb.hash.test", "bdb.btree.test", "bdb.hash.a.test", "bdb.btree.a.test", "sqlite.test", "sqlite2.test" ); my $dbh = DBI->connect( 'dbi:SQLite:dbname=sqlite.test', '', '', { AutoCommi +t => 1, RaiseError => 1 } ); my $dbh2 = DBI->connect( 'dbi:SQLite:dbname=sqlite2.test', '', '', { AutoComm +it => 0, RaiseError => 1 } ); $dbh->do('CREATE TABLE test (id integer PRIMARY KEY, foo varchar(100)) +'); $dbh2->do('CREATE TABLE test (id integer PRIMARY KEY, foo varchar(100) +)'); $dbh->do('PRAGMA default_synchronous = off'); my $sth_ins = $dbh->prepare('INSERT INTO test (id, foo) values (?, ?) +'); my $sth_sel = $dbh->prepare('SELECT foo FROM test where id=?'); my $sth2_ins = $dbh2->prepare('INSERT INTO test (id, foo) values (?, ? +)'); my $sth2_sel = $dbh2->prepare('SELECT foo FROM test where id=?'); tie my %bdb_btree_a, "DB_File", "bdb.btree.a.test", O_RDWR | O_CREAT, +0640, $DB_BTREE or die "Cannot open file 'bdb.btree.a.test': $!\n"; tied(%bdb_btree_a)->filter_fetch_key( sub { $_ = unpack( "N", $_ || 0 +) } ); tied(%bdb_btree_a)->filter_store_key( sub { $_ = pack( "N", $_ || 0 ) +} ); tie my %bdb_hash_a, "DB_File", "bdb.hash.a.test", O_RDWR | O_CREAT, 06 +40, $DB_HASH or die "Cannot open file 'bdb.hash.a.test': $!\n"; tied(%bdb_hash_a)->filter_fetch_key( sub { $_ = unpack( "N", $_ || 0 ) + } ); tied(%bdb_hash_a)->filter_store_key( sub { $_ = pack( "N", $_ || 0 ) } + ); tie my %bdb_hash, "DB_File", "bdb.hash.test", O_RDWR | O_CREAT, 0640, +$DB_HASH or die "Cannot open file 'bdb.hash.test': $!\n"; tie my %bdb_btree, "DB_File", "bdb.btree.test", O_RDWR | O_CREAT, 0640 +, $DB_BTREE or die "Cannot open file 'bdb.btree.test': $!\n"; sub insertLite { $sth_ins->execute( $id_lit++, $txt ); } sub insertLite2 { $sth2_ins->execute( $id_lit2++, $txt ); } sub insertBDB_btree_a { $bdb_btree_a{ $id_BDB_btree_a++ } = $txt; } sub insertBDB_hash_a { $bdb_hash_a{ $id_BDB_hash_a++ } = $txt; } sub insertBDB_btree { $bdb_btree{ $id_BDB_btree++ } = $txt; } sub insertBDB_hash { $bdb_hash{ $id_BDB_hash++ } = $txt; } sub selectLite { $sth_sel->execute( $id_lit++ ); 1 while $sth_sel->fetch; } sub selectLite2 { $sth2_sel->execute( $id_lit2++ ); 1 while $sth2_sel->fetch; } sub selectBDB_btree_a { $bdb_btree_a{ $id_BDB_btree_a++ }; } sub selectBDB_hash_a { $bdb_hash_a{ $id_BDB_hash_a++ }; } sub selectBDB_btree { $bdb_btree{ $id_BDB_btree++ }; } sub selectBDB_hash { $bdb_hash{ $id_BDB_hash++ }; } cmpthese( 30_000, { 'I SQLite' => \&insertLite, 'I SQLite 2' => \&insertLite2, 'I Hash' => \&insertBDB_hash, 'I B+Tree' => \&insertBDB_btree, 'I Hash (A)' => \&insertBDB_hash_a, 'I B+Tree (A)' => \&insertBDB_btree_a, }, ); $id_BDB_hash = 0; $id_BDB_hash_a = 0; $id_BDB_btree_a = 0; $id_BDB_btree = 0; $id_lit2 = 0; $id_lit = 0; cmpthese( 30_000, { 'S SQLite' => \&selectLite, 'S SQLite 2' => \&selectLite2, 'S Hash' => \&selectBDB_hash, 'S B+Tree' => \&selectBDB_btree, 'S Hash(A)' => \&selectBDB_hash_a, 'S B+Tree(A)' => \&selectBDB_btree_a, }, ); foreach my $file ( "bdb.hash.test", "bdb.btree.test", "bdb.hash.a.test", "bdb.btree.a +.test", "sqlite.test", "sqlite2.test" ) { print "$file :" . ( -s $file ) . "\n"; }
Yves / DeMerphq
--
Benchmarking things isn't as easy or as straightforward as many people seem to think....
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: New Benchmark: SQLite vs CDB_File vs BerkeleyDB
by Anonymous Monk on Jan 20, 2011 at 16:09 UTC | |
by Anonymous Monk on Jan 20, 2011 at 18:10 UTC | |
by Anonymous Monk on Oct 26, 2011 at 11:50 UTC |