in reply to MLDBM Efficiency

That's an interesting question (really). Before reading Mastering Algorithms with Perl (O'Reilly & Associates), I would have assumed that as you add elements to a conventional Perl hash there must be some increase in the amount of time it takes to look up one of the elements. Even after reading MAwP, which gives an introduction to how hashes are implemented, I might have thought that the order of growth as you add elements would be O(log N), which is a fancy way of saying that as you add elements there is a relatively small growth in the length of time it takes to perform lookups.

"...might have thought...", I say, because MAwP puts that theory to rest and sets the record straight when it specifically states that a well written hashing algorithm will yield O(1) order of growth, which is a fancy way of saying that there is no growth in the computational work of looking up hash elements as the hash grows large.

Here is a test verifying that theory with respect to Perl's internal hashes.

use Benchmark qw/:all/; @lrg_hash{ 0 .. 20000 } = (); @sml_hash{ 0 .. 10 } = (); $sml_hash{ 5000 } = undef; # Create a hash key of the same name in # %sml_hash as in %lrg_hash. cmpthese ( 500000, { Big => '$lrg_hash{5000}=1;', Little => '$sml_hash{5000}=1;' } ); __OUTPUT__ Rate Big Little Big 378788/s -- -1% Little 381679/s 1% --

Clearly there is an insignificant time difference.

I'm not familiar with MLDBM's internal algorithm, but if it is a well written hash, as are Perl's hashes, you can expect O(1) order of growth. Go ahead and let the DB grow to 50,000 elements, or 500,000 elements, it won't matter. To confirm this, I ran the following benchmark:

use MLDBM; use Benchmark qw/:all/; use Fcntl; $dbL = tie %lrg_hash, 'MLDBM', 'lrgtest.dbm', O_CREAT|O_RDWR, 0640 or die $!; $dbS = tie %sml_hash, 'MLDBM', 'smltest.dbm', O_CREAT|O_RDWR, 0640 or die $!; @lrg_hash{ 0 .. 20000 } = (); @sml_hash{ 0 .. 10 } = (); $sml_hash{ 5000 } = undef; # Create a hash key of the same name in # %sml_hash as in %lrg_hash. cmpthese ( 5000, { Big => '$lrg_hash{5000}=1;', Little => '$sml_hash{5000}=1;' } ); __OUTPUT__ Rate Big Little Big 3247/s -- -4% Little 3378/s 4% --

Although the DB access slows things down considerably, there is no appreciable difference between the speed of looking up a hash element in the 20,000 element hash, and the 11 element hash.


Dave