in reply to SQLite vs CDB_File vs BerkeleyDB

Hi.

Well as I stated earlier I had serious reservations about the validity of your benchmark. When I messaged you about it you replied "The point was to use all the modules' simplest interfaces". This is a specious argument. There is no difference between the interface for a DB_HASH or a DB_BTREE. The difference is underneath at the implementation level. And the difference is huge.

I reran your benchmark without CDB_File, it wont build cleanly on W32, and produced the following results which incidentally do not completely support my earlier coments ( the file size differences are not what I said they would be.) However they do show that your comparison of DB_HASH against DBD::SQLite is misleading (SQLite gets _creamed_ by a $DB_TREE), and they also show that if you are looking for high speed with a BerkeleyDB then you should definately go with a $DB_BTREE instead of a $DB_HASH.

My results were as follows:

Benchmark: timing 30000 iterations of 
	I B+Tree, I B+Tree (A), I Hash, I Hash (A), I SQLite, I SQLite 2...
    I B+Tree:  1 wallclock secs ( 0.88 usr +  0.11 sys =  0.98 CPU) @ 30487.80/s (n=30000)
I B+Tree (A):  1 wallclock secs ( 0.95 usr +  0.02 sys =  0.97 CPU) @ 30959.75/s (n=30000)
      I Hash:  8 wallclock secs ( 2.25 usr +  4.38 sys =  6.63 CPU) @ 4528.30/s  (n=30000)
  I Hash (A):  7 wallclock secs ( 2.31 usr +  4.58 sys =  6.89 CPU) @ 4353.50/s  (n=30000)
    I SQLite: 76 wallclock secs (14.84 usr + 49.66 sys = 64.50 CPU) @ 465.12/s   (n=30000)
  I SQLite 2:  4 wallclock secs ( 3.30 usr +  0.09 sys =  3.39 CPU) @ 8849.56/s  (n=30000)
  
                Rate I SQLite I Hash (A) I Hash I SQLite 2 I B+Tree I B+Tree (A)
I SQLite       465/s       --       -89%   -90%       -95%     -98%         -98%
I Hash (A)    4354/s     836%         --    -4%       -51%     -86%         -86%
I Hash        4528/s     874%         4%     --       -49%     -85%         -85%
I SQLite 2    8850/s    1803%       103%    95%         --     -71%         -71%
I B+Tree     30488/s    6455%       600%   573%       245%       --          -2%
I B+Tree (A) 30960/s    6556%       611%   584%       250%       2%           --



Benchmark: timing 30000 iterations of 
	S B+Tree, S B+Tree(A), S Hash, S Hash(A), S SQLite, S SQLite 2...
   S B+Tree:  1 wallclock secs ( 0.77 usr +  0.09 sys =  0.86 CPU) @ 34965.03/s (n=30000)
S B+Tree(A):  1 wallclock secs ( 0.95 usr +  0.05 sys =  1.00 CPU) @ 30000.00/s (n=30000)
     S Hash:  2 wallclock secs ( 1.03 usr +  1.63 sys =  2.66 CPU) @ 11295.18/s (n=30000)
  S Hash(A):  3 wallclock secs ( 1.47 usr +  1.36 sys =  2.83 CPU) @ 10608.20/s (n=30000)
   S SQLite: 11 wallclock secs ( 6.80 usr +  3.69 sys = 10.49 CPU) @ 2861.23/s  (n=30000)
 S SQLite 2:  4 wallclock secs ( 3.89 usr +  0.09 sys =  3.98 CPU) @ 7530.12/s  (n=30000)
 
               Rate S SQLite S SQLite 2 S Hash(A)    S Hash S B+Tree(A) S B+Tree
S SQLite     2861/s       --       -62%      -73%      -75%        -90%     -92%
S SQLite 2   7530/s     163%         --      -29%      -33%        -75%     -78%
S Hash(A)   10608/s     271%        41%        --       -6%        -65%     -70%
S Hash      11295/s     295%        50%        6%        --        -62%     -68%
S B+Tree(A) 30000/s     948%       298%      183%      166%          --     -14%
S B+Tree    34965/s    1122%       364%      230%      210%         17%       --

Resulting File Sizes:
bdb.hash.test :5226496
bdb.btree.test :6701056
bdb.hash.a.test :5095424
bdb.btree.a.test :3579904
sqlite.test :3845120
sqlite2.test :2043904
And the code I used to produce this benchmark is
use DB_File; use DBI; use Benchmark qw(cmpthese); use strict; use warnings; # do some setup my $id_BDB_hash = 0; my $id_BDB_hash_a = 0; my $id_BDB_btree_a = 0; my $id_BDB_btree = 0; my $id_lit2 = 0; my $id_lit = 0; my $txt = 'a' x 100; unlink( "bdb.hash.test", "bdb.btree.test", "bdb.hash.a.test", "bdb.btree.a.test", "sqlite.test", "sqlite2.test" ); my $dbh = DBI->connect( 'dbi:SQLite:dbname=sqlite.test', '', '', { AutoCommi +t => 1, RaiseError => 1 } ); my $dbh2 = DBI->connect( 'dbi:SQLite:dbname=sqlite2.test', '', '', { AutoComm +it => 0, RaiseError => 1 } ); $dbh->do('CREATE TABLE test (id integer PRIMARY KEY, foo varchar(100)) +'); $dbh2->do('CREATE TABLE test (id integer PRIMARY KEY, foo varchar(100) +)'); $dbh->do('PRAGMA default_synchronous = off'); my $sth_ins = $dbh->prepare('INSERT INTO test (id, foo) values (?, ?) +'); my $sth_sel = $dbh->prepare('SELECT foo FROM test where id=?'); my $sth2_ins = $dbh2->prepare('INSERT INTO test (id, foo) values (?, ? +)'); my $sth2_sel = $dbh2->prepare('SELECT foo FROM test where id=?'); tie my %bdb_btree_a, "DB_File", "bdb.btree.a.test", O_RDWR | O_CREAT, +0640, $DB_BTREE or die "Cannot open file 'bdb.btree.a.test': $!\n"; tied(%bdb_btree_a)->filter_fetch_key( sub { $_ = unpack( "N", $_ || 0 +) } ); tied(%bdb_btree_a)->filter_store_key( sub { $_ = pack( "N", $_ || 0 ) +} ); tie my %bdb_hash_a, "DB_File", "bdb.hash.a.test", O_RDWR | O_CREAT, 06 +40, $DB_HASH or die "Cannot open file 'bdb.hash.a.test': $!\n"; tied(%bdb_hash_a)->filter_fetch_key( sub { $_ = unpack( "N", $_ || 0 ) + } ); tied(%bdb_hash_a)->filter_store_key( sub { $_ = pack( "N", $_ || 0 ) } + ); tie my %bdb_hash, "DB_File", "bdb.hash.test", O_RDWR | O_CREAT, 0640, +$DB_HASH or die "Cannot open file 'bdb.hash.test': $!\n"; tie my %bdb_btree, "DB_File", "bdb.btree.test", O_RDWR | O_CREAT, 0640 +, $DB_BTREE or die "Cannot open file 'bdb.btree.test': $!\n"; sub insertLite { $sth_ins->execute( $id_lit++, $txt ); } sub insertLite2 { $sth2_ins->execute( $id_lit2++, $txt ); } sub insertBDB_btree_a { $bdb_btree_a{ $id_BDB_btree_a++ } = $txt; } sub insertBDB_hash_a { $bdb_hash_a{ $id_BDB_hash_a++ } = $txt; } sub insertBDB_btree { $bdb_btree{ $id_BDB_btree++ } = $txt; } sub insertBDB_hash { $bdb_hash{ $id_BDB_hash++ } = $txt; } sub selectLite { $sth_sel->execute( $id_lit++ ); 1 while $sth_sel->fetch; } sub selectLite2 { $sth2_sel->execute( $id_lit2++ ); 1 while $sth2_sel->fetch; } sub selectBDB_btree_a { $bdb_btree_a{ $id_BDB_btree_a++ }; } sub selectBDB_hash_a { $bdb_hash_a{ $id_BDB_hash_a++ }; } sub selectBDB_btree { $bdb_btree{ $id_BDB_btree++ }; } sub selectBDB_hash { $bdb_hash{ $id_BDB_hash++ }; } cmpthese( 30_000, { 'I SQLite' => \&insertLite, 'I SQLite 2' => \&insertLite2, 'I Hash' => \&insertBDB_hash, 'I B+Tree' => \&insertBDB_btree, 'I Hash (A)' => \&insertBDB_hash_a, 'I B+Tree (A)' => \&insertBDB_btree_a, }, ); $id_BDB_hash = 0; $id_BDB_hash_a = 0; $id_BDB_btree_a = 0; $id_BDB_btree = 0; $id_lit2 = 0; $id_lit = 0; cmpthese( 30_000, { 'S SQLite' => \&selectLite, 'S SQLite 2' => \&selectLite2, 'S Hash' => \&selectBDB_hash, 'S B+Tree' => \&selectBDB_btree, 'S Hash(A)' => \&selectBDB_hash_a, 'S B+Tree(A)' => \&selectBDB_btree_a, }, ); foreach my $file ( "bdb.hash.test", "bdb.btree.test", "bdb.hash.a.test", "bdb.btree.a +.test", "sqlite.test", "sqlite2.test" ) { print "$file :" . ( -s $file ) . "\n"; }
As Matts said, please no comments about the benchmarking code, I know its could be a lot nicer. (Its only a benchmark... :-)

Yves / DeMerphq
--
Benchmarking things isn't as easy or as straightforward as many people seem to think....

Replies are listed 'Best First'.
Re: New Benchmark: SQLite vs CDB_File vs BerkeleyDB
by Anonymous Monk on Jan 20, 2011 at 16:09 UTC
    Hi, years later, SQLite is better, and there are some optimisations possible.
    Rate I SQLite I Hash (A) I Hash I SQLite 2 I B+Tree (A +) I B+Tree I SQLite 18927/s -- -50% -51% -78% -80 +% -80% I Hash (A) 37975/s 101% -- -3% -56% -60 +% -60% I Hash 38961/s 106% 3% -- -55% -59 +% -59% I SQLite 2 86957/s 359% 129% 123% -- -9 +% -9% I B+Tree (A) 95238/s 403% 151% 144% 10% - +- 0% I B+Tree 95238/s 403% 151% 144% 10% 0 +% -- Rate S SQLite S Hash(A) S Hash S B+Tree(A) S SQLite +2 S B+Tree S SQLite 32787/s -- -44% -52% -63% -64 +% -68% S Hash(A) 58252/s 78% -- -16% -35% -37 +% -44% S Hash 68966/s 110% 18% -- -23% -25 +% -33% S B+Tree(A) 89552/s 173% 54% 30% -- -3 +% -13% S SQLite 2 92308/s 182% 58% 34% 3% - +- -11% S B+Tree 103448/s 216% 78% 50% 16% 12 +% --
    Here your modified benchmark code:
    use DB_File; use DBI; use Benchmark qw(cmpthese); use strict; use warnings; # do some setup my $id_BDB_hash = 0; my $id_BDB_hash_a = 0; my $id_BDB_btree_a = 0; my $id_BDB_btree = 0; my $id_lit2 = 0; my $id_lit = 0; my $txt = 'a' x 100; unlink( "bdb.hash.test", "bdb.btree.test", "bdb.hash.a.test", "bdb.btree.a.test", "sqlite.test", "sqlite2.test" ); my $dbh = DBI->connect( 'dbi:SQLite:dbname=sqlite.test', '', '', { A +utoCommit => 1, RaiseError => 1 } ); my $dbh2 = DBI->connect( 'dbi:SQLite:dbname=sqlite2.test', '', '', { A +utoCommit => 0, RaiseError => 1 } ); $dbh->do('PRAGMA page_size = 4096'); $dbh->do('PRAGMA synchronous = off'); $dbh->do('PRAGMA cache_size = 6000'); $dbh->do('PRAGMA journal_mode = MEMORY'); $dbh->do('PRAGMA temp_store = MEMORY'); $dbh->do('CREATE TABLE test (id integer PRIMARY KEY, foo varchar(100)) +'); $dbh2->do('PRAGMA page_size = 4096'); $dbh2->do('PRAGMA cache_size = 6000'); $dbh2->do('PRAGMA journal_mode = MEMORY'); $dbh2->do('PRAGMA temp_store = MEMORY'); $dbh2->do('CREATE TABLE test (id integer PRIMARY KEY, foo varchar(100) +)'); my $sth_ins = $dbh->prepare('INSERT INTO test (id, foo) values (?, ?) +'); my $sth_sel = $dbh->prepare('SELECT foo FROM test where id=?'); my $sth2_ins = $dbh2->prepare('INSERT INTO test (id, foo) values (?, ? +)'); my $sth2_sel = $dbh2->prepare('SELECT foo FROM test where id=?'); tie my %bdb_btree_a, "DB_File", "bdb.btree.a.test", O_RDWR | O_CREAT, +0640, $DB_BTREE or die "Cannot open file 'bdb.btree.a.test': $!\n"; tied(%bdb_btree_a)->filter_fetch_key( sub { $_ = unpack( "N", $_ || 0 +) } ); tied(%bdb_btree_a)->filter_store_key( sub { $_ = pack( "N", $_ || 0 ) +} ); tie my %bdb_hash_a, "DB_File", "bdb.hash.a.test", O_RDWR | O_CREAT, 06 +40, $DB_HASH or die "Cannot open file 'bdb.hash.a.test': $!\n"; tied(%bdb_hash_a)->filter_fetch_key( sub { $_ = unpack( "N", $_ || 0 ) + } ); tied(%bdb_hash_a)->filter_store_key( sub { $_ = pack( "N", $_ || 0 ) } + ); tie my %bdb_hash, "DB_File", "bdb.hash.test", O_RDWR | O_CREAT, 0640, +$DB_HASH or die "Cannot open file 'bdb.hash.test': $!\n"; tie my %bdb_btree, "DB_File", "bdb.btree.test", O_RDWR | O_CREAT, 0640 +, $DB_BTREE or die "Cannot open file 'bdb.btree.test': $!\n"; sub insertLite { $sth_ins->execute( $id_lit++, $txt ); } sub insertLite2 { $sth2_ins->execute( $id_lit2++, $txt ); } sub insertBDB_btree_a { $bdb_btree_a{ $id_BDB_btree_a++ } = $txt; } sub insertBDB_hash_a { $bdb_hash_a{ $id_BDB_hash_a++ } = $txt; } sub insertBDB_btree { $bdb_btree{ $id_BDB_btree++ } = $txt; } sub insertBDB_hash { $bdb_hash{ $id_BDB_hash++ } = $txt; } sub selectLite { $dbh->selectall_arrayref($sth_sel, undef, $id_lit2++ ) } sub selectLite2 { $dbh2->selectall_arrayref($sth2_sel, undef, $id_lit2++ ) } sub selectBDB_btree_a { $bdb_btree_a{ $id_BDB_btree_a++ }; } sub selectBDB_hash_a { $bdb_hash_a{ $id_BDB_hash_a++ }; } sub selectBDB_btree { $bdb_btree{ $id_BDB_btree++ }; } sub selectBDB_hash { $bdb_hash{ $id_BDB_hash++ }; } cmpthese( 60_000, { 'I SQLite' => \&insertLite, 'I SQLite 2' => \&insertLite2, 'I Hash' => \&insertBDB_hash, 'I B+Tree' => \&insertBDB_btree, 'I Hash (A)' => \&insertBDB_hash_a, 'I B+Tree (A)' => \&insertBDB_btree_a, }, ); $dbh2->commit; $id_BDB_hash = 0; $id_BDB_hash_a = 0; $id_BDB_btree_a = 0; $id_BDB_btree = 0; $id_lit2 = 0; $id_lit = 0; cmpthese( 60_000, { 'S SQLite' => \&selectLite, 'S SQLite 2' => \&selectLite2, 'S Hash' => \&selectBDB_hash, 'S B+Tree' => \&selectBDB_btree, 'S Hash(A)' => \&selectBDB_hash_a, 'S B+Tree(A)' => \&selectBDB_btree_a, }, ); foreach my $file ( "bdb.hash.test", "bdb.btree.test", "bdb.hash.a.test", "bdb.btree.a +.test", "sqlite.test", "sqlite2.test" ) { print "$file :" . ( -s $file ) . "\n"; }
      against which version of libdb?

        Probably 4.25

        With

        $ pmvers DBI DBD::SQLite DB_File BerkeleyDB DBI: 1.616 DBD::SQLite: 1.33 DB_File: 1.824 BerkeleyDB: 0.49 $ perl -MBerkeleyDB -le " print $BerkeleyDB::db_version 5.1 $ perl -MDB_File -le " print $DB_File::db_version " 5.1

        DBD-SQLite-1.33 has #define SQLITE_VERSION "3.7.6.3"

        Rate I SQLite I Hash (A) I Hash I B+Tree (A) I SQLite + 2 I B+Tree I SQLite 10079/s -- -60% -62% -83% -8 +9% -90% I Hash (A) 25263/s 151% -- -5% -57% -7 +2% -75% I Hash 26478/s 163% 5% -- -55% -7 +0% -74% I B+Tree (A) 58196/s 477% 130% 120% -- -3 +5% -42% I SQLite 2 89419/s 787% 254% 238% 54% +-- -12% I B+Tree 101180/s 904% 301% 282% 74% 1 +3% -- Rate S SQLite S Hash(A) S Hash S B+Tree(A) S B+Tree +S SQLite 2 S SQLite 16411/s -- -66% -72% -81% -82% + -85% S Hash(A) 48038/s 193% -- -19% -45% -46% + -56% S Hash 59055/s 260% 23% -- -32% -34% + -46% S B+Tree(A) 87209/s 431% 82% 48% -- -2% + -21% S B+Tree 89286/s 444% 86% 51% 2% -- + -19% S SQLite 2 109890/s 570% 129% 86% 26% 23% + -- bdb.hash.test :10469376 bdb.btree.test :13410304 bdb.hash.a.test :10616832 bdb.btree.a.test :7159808 sqlite.test :6668288 sqlite2.test :6668288

        It is now possible to compile libdb with sqlite3 interface, and link DBD::SQLite against it

        BerkeleyDB:SQL performance and concurrency is superior to SQLite in write-intensive applications and heavy workloads, allowing it to get higher TPS throughput

        4000 transactions per-second (TPS) scaling well up to 100 connections, whereas SQLite remains constant at 500 TPS,