This is the kind of thing that I start wondering about while riding the F train home to Brooklyn: what's faster for managing a set of medium-sized documents, BerkeleyDB or individual files? Some might think the answer is obvious, but consider that we have a new-ish file system on Linux (ext3) and that the file system has the advantage of running in kernel space with full support of VM system caching, and it seems like a fair fight. So I benched it.
#!/usr/bin/perl -w use strict; use Benchmark qw(:all); use BerkeleyDB; my $file_dir = '/home/perrin/filetest'; my $db_dir = '/home/perrin/dbdir'; my $db_file = '/home/perrin/dbtest'; my %db; my $env = new BerkeleyDB::Env( -Home => $db_dir, -Flags => DB_INIT_CDB | DB_CREATE | DB_INIT_MPOOL ) or die "can't create BerkelyDB::Env: $!"; my $db_obj = tie %db, 'BerkeleyDB::Btree', -Filename => $db_file, -Flags => DB_CREATE, -Mode => 0666, -Env => $env or die $!; sub read_file { my $key = shift; my $file = "$file_dir/$key"; open(FH, '<', $file) or die $!; local $/; my $value = <FH>; close FH; return $value; } sub write_file { my ($key, $value) = @_; my $file = "$file_dir/$key"; open(FH, '>', $file) or die $!; print FH $value; close FH; } cmpthese(10, { 'file write' => sub { for (0..1000) { write_file($_, $_ x 8000); } }, 'berkeley write' => sub { for (0..1000) { $db_obj->STORE($_, $_ x 8000); } }, }); cmpthese(10, { 'file read' => sub { for (0..1000) { read_file($_); } }, 'berkeley read' => sub { for (0..1000) { $db_obj->FETCH($_); } }, });
And here's what I got:
[perrin@localhost perrin]$ ./file_bench.pl Benchmark: timing 100 iterations of berkeley write, file write... berkeley write: 24 wallclock secs (11.36 usr + 7.12 sys = 18.48 CPU) +@ 5.41/s (n=100) file write: 29 wallclock secs ( 8.67 usr + 8.19 sys = 16.86 CPU) @ 5 +.93/s (n=100) Rate berkeley write file write berkeley write 5.41/s -- -9% file write 5.93/s 10% -- Benchmark: timing 100 iterations of berkeley read, file read... berkeley read: 7 wallclock secs ( 3.92 usr + 3.21 sys = 7.13 CPU) @ + 14.03/s (n=100) file read: 5 wallclock secs ( 2.99 usr + 2.03 sys = 5.02 CPU) @ 19 +.92/s (n=100) Rate berkeley read file read berkeley read 14.0/s -- -30% file read 19.9/s 42% --
Look at the wallclock. Berkeley is faster at writing, but slower at reading. If you increase the number of records to 10000, the results are similar. At a very small record size (50 bytes), Berkeley comes out on top for both.

This is a Red Hat 8.0 system with the latest kernel on a P4 2.4 GHz machine with 512MB RAM.

Anyone out there want to try it on ReiserFS?

UPDATE: $BerkeleyDB::db_version == 4.0 and $BerkeleyDB::VERSION == 0.2.

UPDATE #2: Thanks to a tip from fellow subway rider Aaron Ross, I adjusted the Cachesize setting for BerkeleyDB and now it beats the file system by a significant margin. Below is the final code (including other suggestions):

#!/usr/bin/perl -w use strict; use Benchmark qw(:all); use BerkeleyDB; my $file_dir = '/home/perrin/filetest'; my $db_dir = '/home/perrin/dbdir'; my $db_file = '/home/perrin/dbtest'; my %db; my $env = new BerkeleyDB::Env( -Home => $db_dir, -Flags => DB_INIT_CDB | DB_CREATE | DB_INIT_MPOOL, -Cachesize => 23152000, ) or die "can't create BerkelyDB::Env: $!"; my $db_obj = tie %db, 'BerkeleyDB::Btree', -Filename => $db_file, -Flags => DB_CREATE, -Mode => 0666, -Env => $env or die $!; sub read_file { my $key = shift; my $file = "$file_dir/$key"; my $value; open(FH, '<', $file) or die $!; read FH, $value, (stat FH)[7]; close FH; return $value; } sub slurp_file { my $key = shift; my $file = "$file_dir/$key"; local $/; open(FH, '<', $file) or die $!; my $value = <FH>; close FH; return $value; } sub sysread_file { my $key = shift; my $file = "$file_dir/$key"; my $value; open(FH, '<', $file) or die $!; sysread FH, $value, (stat FH)[7]; close FH; return $value; } sub print_file { my ($key, $value) = @_; my $file = "$file_dir/$key"; open(FH, '>', $file) or die $!; print FH $value; close FH; } sub write_file { my ($key, $value) = @_; my $file = "$file_dir/$key"; open(FH, '>', $file) or die $!; print FH $value; close FH; } sub syswrite_file { my ($key, $value) = @_; my $file = "$file_dir/$key"; open(FH, '>', $file) or die $!; print FH $value; close FH; } cmpthese(50, { 'file write' => sub { for (0..1000) { write_file($_, $_ x 8000); } }, 'berkeley write' => sub { for (0..1000) { $db_obj->db_put($_, $_ x 8000); } }, 'file print' => sub { for (0..1000) { print_file($_, $_ x 8000); } }, 'file syswrite' => sub { for (0..1000) { syswrite_file($_, $_ x 8000); } }, }); cmpthese(100, { 'file read' => sub { for (0..1000) { read_file($_); } }, 'file slurp' => sub { my $test; for (0..1000) { $test = slurp_file($_); } }, 'file sysread' => sub { my $test; for (0..1000) { $test = sysread_file($_); } }, 'berkeley read' => sub { my $v; for (0..1000) { $db_obj->db_get($_, $v); } }, });
This gives the following results:
Benchmark: timing 50 iterations of berkeley write, file print, file sy +swrite, file write... berkeley write: 5 wallclock secs ( 5.17 usr + 0.02 sys = 5.19 CPU) +@ 9.63/s (n=50) file print: 10 wallclock secs ( 4.38 usr + 4.00 sys = 8.38 CPU) @ 5 +.97/s (n=50) file syswrite: 11 wallclock secs ( 4.35 usr + 4.08 sys = 8.43 CPU) @ + 5.93/s (n=50) file write: 10 wallclock secs ( 4.37 usr + 4.26 sys = 8.63 CPU) @ 5 +.79/s (n=50) Rate file write file syswrite file print berk +eley write file write 5.79/s -- -2% -3% + -40% file syswrite 5.93/s 2% -- -1% + -38% file print 5.97/s 3% 1% -- + -38% berkeley write 9.63/s 66% 62% 61% + -- Benchmark: timing 100 iterations of berkeley read, file read, file slu +rp, file sysread... berkeley read: 4 wallclock secs ( 3.72 usr + 0.03 sys = 3.75 CPU) @ + 26.67/s (n=100) file read: 5 wallclock secs ( 2.71 usr + 2.01 sys = 4.72 CPU) @ 21 +.19/s (n=100) file slurp: 6 wallclock secs ( 3.88 usr + 2.03 sys = 5.91 CPU) @ 16 +.92/s (n=100) file sysread: 4 wallclock secs ( 2.49 usr + 1.91 sys = 4.40 CPU) @ +22.73/s (n=100) Rate file slurp file read file sysread berkele +y read file slurp 16.9/s -- -20% -26% + -37% file read 21.2/s 25% -- -7% + -21% file sysread 22.7/s 34% 7% -- + -15% berkeley read 26.7/s 58% 26% 17% + --
If you are using BerkeleyDB, make sure you tune that cache size! Use the db_stat utility and this document.

Replies are listed 'Best First'.
•Re: BerkeleyDB vs. Linux file system
by merlyn (Sage) on Mar 18, 2003 at 13:01 UTC
    At a conference, I saw Tim Bray (father of XML) talk about managing large data sets, and he urged people to not overlook how darn efficient the filesystem, including the directory lookup code, has gotten in modern open source "unix" releases. Don't add a database or a substructure until you've proven its need through benchmarking.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

Re: BerkeleyDB vs. Linux file system
by PodMaster (Abbot) on Mar 18, 2003 at 16:28 UTC
    I'd like to point out that files contain more info than simple key/value entries in BerkeleyDB.

    I'd also like to point out that you cannot read/write N bytes at a time with BerkeleyDB.

    Now here are results for

    BerkeleyDB vs. NTFS

    Benchmark: timing 10 iterations of berkeley write, file write... berkeley write: 26 wallclock secs ( 0.92 usr + 1.16 sys = 2.08 CPU) +@ 4.81/s (n=10) file write: 54 wallclock secs ( 3.17 usr + 8.80 sys = 11.97 CPU) @ 0 +.84/s (n=10) s/iter file write berkeley write file write 1.20 -- -83% berkeley write 0.208 476% -- Benchmark: timing 10 iterations of berkeley read, file read... berkeley read: 2 wallclock secs ( 0.53 usr + 0.55 sys = 1.08 CPU) @ + 9.28/s (n=10) file read: 2 wallclock secs ( 1.88 usr + 0.89 sys = 2.77 CPU) @ 3 +.62/s (n=10) Rate file read berkeley read file read 3.62/s -- -61% berkeley read 9.28/s 157% -- E:\dev\LOOSE>perl BerkeleyDB.V.NTFS.pl Benchmark: timing 10 iterations of berkeley write, file write... berkeley write: 26 wallclock secs ( 0.94 usr + 1.20 sys = 2.14 CPU) +@ 4.67/s (n=10) file write: 52 wallclock secs ( 3.28 usr + 8.14 sys = 11.42 CPU) @ 0 +.88/s (n=10) s/iter file write berkeley write file write 1.14 -- -81% berkeley write 0.214 433% -- Benchmark: timing 10 iterations of berkeley read, file read... berkeley read: 1 wallclock secs ( 0.41 usr + 0.63 sys = 1.03 CPU) @ + 9.70/s (n=10) file read: 3 wallclock secs ( 1.84 usr + 0.89 sys = 2.73 CPU) @ 3 +.66/s (n=10) Rate file read berkeley read file read 3.66/s -- -62% berkeley read 9.70/s 165% -- $BerkeleyDB::VERSION = 4.0; $BerkeleyDB::VERSION = 0.18; # Microsoft Windows 2000 [Version 5.00.2195] (no other NTFS info avail +able ;)


    MJD says you can't just make shit up and expect the computer to know what you mean, retardo!
    I run a Win32 PPM repository for perl 5.6x+5.8x. I take requests.
    ** The Third rule of perl club is a statement of fact: pod is sexy.

      Hmmm, pretty similar. I will look at using sysread/syswrite later on. That could make a difference, although the code becomes much more complicated.
Re: BerkeleyDB vs. Linux file system
by thraxil (Prior) on Mar 18, 2003 at 16:58 UTC

    PIII 1GHz, ReiserFS 3.6.25 (compiled into the kernel) on a 10,000rpm SCSI drive, linux 2.4.19 (gentoo sources). $BerkeleyDB::db_version == 3.2, $BerkeleyDB::VERSION == 0.20

    Benchmark: timing 10 iterations of berkeley write, file write... berkeley write: 7 wallclock secs ( 4.01 usr + 3.24 sys = 7.25 CPU) +@ 1.38/s (n=10) file write: 15 wallclock secs ( 3.00 usr + 10.51 sys = 13.51 CPU) @ 0 +.74/s (n=10) s/iter file write berkeley write file write 1.35 -- -46% berkeley write 0.725 86% -- Benchmark: timing 10 iterations of berkeley read, file read... berkeley read: 2 wallclock secs ( 1.07 usr + 1.21 sys = 2.28 CPU) @ + 4.39/s (n=10) file read: 1 wallclock secs ( 0.50 usr + 0.61 sys = 1.11 CPU) @ 9 +.01/s (n=10) Rate berkeley read file read berkeley read 4.39/s -- -51% file read 9.01/s 105% --

    then i changed the 8000's to 800 and the 10's to 100 to see how it would do with smaller files:

    Benchmark: timing 100 iterations of berkeley write, file write... berkeley write: 11 wallclock secs ( 4.83 usr + 5.31 sys = 10.14 CPU) +@ 9.86/s (n=100) file write: 68 wallclock secs ( 5.81 usr + 56.34 sys = 62.15 CPU) @ 1 +.61/s (n=100) Rate file write berkeley write file write 1.61/s -- -84% berkeley write 9.86/s 513% -- Benchmark: timing 100 iterations of berkeley read, file read... berkeley read: 5 wallclock secs ( 3.49 usr + 2.11 sys = 5.60 CPU) @ + 17.86/s (n=100) file read: 4 wallclock secs ( 2.69 usr + 1.33 sys = 4.02 CPU) @ 24 +.88/s (n=100) Rate berkeley read file read berkeley read 17.9/s -- -28% file read 24.9/s 39% --

    anders pearson

      Thanks. Looks like Reiser doesn't make much of a difference for this kind of thing.
Re: BerkeleyDB vs. Linux file system
by zby (Vicar) on Mar 18, 2003 at 09:12 UTC
    ext3 have not any advantage over ext2 in respect of efficiency, it might be even a bit slower due to keeping the journal.

      ext3 is slower, for the reasons you describe--it has to do everything ext2 does, plus a journal. Since it uses ext2 as a base (not just the fs itself, but the code implementation), it is impossible for it to be faster than ext2.

      ReiserFS is totally different. From the benchmarks I've seen, it's generally slower than ext2, but faster than ext3. YMMV.

      Update: Minor grammer mistake fixed.

      ----
      Reinvent a rounder wheel.

      Note: All code is untested, unless otherwise stated

      This is a good point. I think that what has improved in this case is really the kernel and the VM system, not the file system itself.
Re: BerkeleyDB vs. Linux file system
by Jost (Novice) on Mar 18, 2003 at 14:09 UTC
    To state the obvious:

    There is alot of difference between having 100 5MB documents and 5M 100B documents.

    If your test doesn't give performance differences, look at other parameters:
    Using the file system is invaluable during testing, you can just look at your files.
    OTOH, backing up millions of little files is a pain compared to backing up one database file.

    One more question: Do you need any kind of concurrency?
    This could also influence your decision one way or the other.

      I actually think that as long as you start splitting files across directories to keep from getting more than 1000 in a single dir, both file system and Berkeley would scale very far without a huge difference in performance. Remember, BerkeleyDB handles databases with terabytes of data.

      There are definitely many advabtages to having things in normal files, especially for text content, and it's the only choice for NFS or other file servers.

      I did use the DB_INIT_CDB flag, which initialized the concurrency methods. If I leave that off, BerkeleyDB gets faster, but you lose the ability to do concurrent access. I didn't think the test would be very interesting if it used options that didn't allow for concurrency.

Re: BerkeleyDB vs. Linux file system reiserFS
by Tomte (Priest) on Mar 18, 2003 at 11:32 UTC

    Some interesting results, me thinks: Writing is exorbitant costly, reading is nearly equivalent (looking at the wallclocks real secs are a somewhat different matter ;-).

    System: P4 2.4GHz, SuSE 8.1, perl 5.8.0 (Edit:512MB)

    Benchmark: timing 100 iterations of berkeley write, file write... berkeley write: 23 wallclock secs (12.55 usr + 9.56 sys = 22.11 CPU) +@ 4.52/s (n=100) file write: 59 wallclock secs (11.78 usr + 44.22 sys = 56.00 CPU) @ 1 +.79/s (n=100) Rate file write berkeley write file write 1.79/s -- -61% berkeley write 4.52/s 153% -- Benchmark: timing 100 iterations of berkeley read, file read... berkeley read: 8 wallclock secs ( 3.70 usr + 4.78 sys = 8.48 CPU) @ + 11.79/s (n=100) file read: 7 wallclock secs ( 3.00 usr + 3.15 sys = 6.15 CPU) @ 16 +.26/s (n=100) Rate berkeley read file read berkeley read 11.8/s -- -27% file read 16.3/s 38% --

    regards,
    tomte


      I've heard that Reiser does well with many small files, so maybe it would hold up better on 50 byte files than my ext3 system does.
Re: BerkeleyDB vs. Linux file system
by diotalevi (Canon) on Mar 18, 2003 at 16:46 UTC

    perrin is using the BerkeleyDB incorrectly and its showing up slower than it should - replace the ->STORE and ->FETCH methods with ->db_put and ->db_get. Or at least if you're going to use the OO interface use it correctly. This usage is halfway between the tied interface and the OO interface. I think to be more meaningful that the benchmark should have picked a style and stuck with it (OO of course since that's how I use it *smirk*).

    Some other people noted that they'd much prefer that you read from the FH handle using read() instead of readline().

      Results using the slow functions

      Benchmark: timing 10 iterations of berkeley write, file write... berkeley write: 132 wallclock secs (32.46 usr + 20.78 sys = 53.24 CPU) + @ 0.19/s (n=10) file write: 61 wallclock secs (24.18 usr + 12.72 sys = 36.90 CPU) @ 0 +.27/s (n=1 0) s/iter berkeley write file write berkeley write 5.32 -- -31% file write 3.69 44% -- Benchmark: timing 10 iterations of berkeley read, file read... berkeley read: 179 wallclock secs (11.48 usr + 7.47 sys = 18.95 CPU) +@ 0.53/s (n=10) file read: 225 wallclock secs ( 7.72 usr + 6.81 sys = 14.53 CPU) @ +0.69/s (n= 10) s/iter berkeley read file read berkeley read 1.89 -- -23% file read 1.45 30% --

      Results using the faster functions. This shows a very nice boost to file read and a modest boost to BerkeleyDB read and write.

      Benchmark: timing 10 iterations of berkeley write, file write... berkeley write: 96 wallclock secs (25.64 usr + 21.62 sys = 47.26 CPU) +@ 0.21/s (n=10) file write: 58 wallclock secs (23.88 usr + 13.41 sys = 37.29 CPU) @ 0 +.27/s (n=1 0) s/iter berkeley write file write berkeley write 4.73 -- -21% file write 3.73 27% -- Benchmark: timing 10 iterations of berkeley read, file read... berkeley read: 163 wallclock secs (10.58 usr + 7.83 sys = 18.41 CPU) +@ 0.54/s (n=10) file read: 135 wallclock secs ( 8.21 usr + 6.12 sys = 14.33 CPU) @ +0.70/s (n= 10) s/iter berkeley read file read berkeley read 1.84 -- -22% file read 1.43 28% --

      My alteration to perrin's benchmark

      --- perrin-bench.pl Tue Mar 18 11:41:42 2003 +++ perrin-bench2.pl Tue Mar 18 12:00:06 2003 @@ -28,9 +28,9 @@ sub read_file { my $key = shift; my $file = "$file_dir/$key"; + my $value; open(FH, '<', $file) or die $!; - local $/; - my $value = <FH>; + read FH, $value, (stat FH)[7]; close FH; return $value; } @@ -52,20 +52,22 @@ 'berkeley write' => sub { for (0..1000) { - $db_obj->STORE($_, $_ x 8000); + $db_obj->db_put($_, $_ x 8000); } }, }); cmpthese(10, { 'file read' => sub { + my $test; for (0..1000) { - read_file($_); + $test = read_file($_); } }, 'berkeley read' => sub { + my $test; for (0..1000) { - $db_obj->FETCH($_); + $db_obj->db_get($_,$test); } }, });
        Interesting, your results are just the opposite of mine! Berkeley is slower at writing and faster at reading (before the switch to read) in yours. Must be OpenBSD.

        I'd like to see if sysread/syswrite make much of a difference too. I'll try that later on Linux.

      Actually, I did try db_get and db_put. There was no significant difference. It does not affect the results.

        I see a difference between FETCH/STORE and db_put/db_get. All this confirms for me is that BerkeleyDB is fast enough. I'm just glad it competes nicely with the file system (which as you said has all sorts of in-kernel advantages). My own system is OpenBSD 3.2 using the GENERIC kernel on a 233 MMX pentium using ATA-100 discs in "PIO mode 4, Ultra-DMA mode 5" (whatever that means).

Re: BerkeleyDB vs. Linux file system
by mojotoad (Monsignor) on Mar 18, 2003 at 19:23 UTC
    Benchmarking aside, I'm supposing that you don't mind being forced into a programmatic mode of access for the files in question while using BerkleyDB? In other words, there's no reason to access the files in question with something as mundane as vi?

    That's a huge consideration for me. As an example of how to go terribly wrong with this, consider what IBM did with AIX -- all system configuration files were mirrored in a database. As a consequence, most unix system configuration commands had an AIX-centric version that would ensure the database was synchronized with the flat files. Yuck.

    Matt

      The focus here is file systems as a database, not databases as a file system. I completely agree that it would be foolhardy to put your configuration data into a database of any kind without some sort of good reason (and definately not anything /etc-like).