NormalPerson has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

I'm using the MLDBM module extensively to store persistent data structures that can be shared by different Perl scripts, typically to cache a hash of structs which in turn consist of a bunch of scalars, 3 or 4 arrays and a hash of hashes, eg:

use GDBM_File; use MLDBM qw(GDBM_File Storable); use Thing; ## (Thing is a package) tie (my %hash_db, "MLDBM", 'output.db', &GDBM_WRCREAT, 0606); foreach my $key (keys %hash_db) { my $current_thing = $hash_db{$key}; ## do something with current_thing $hash_db{$key} = $current_thing; } untie %hash_db;

The programs which use MLDBM are maintenance programs in the shell, not web apps. The resulting output files (for about 10,000 keys) are about 10 Mb. All operations, whether read or write, seem to be very fast. My only concern is how MLDBM uses memory. We are using it on a shared hosting plan which limits memory usage, so it is something which we should be aware of.

Does anyone know what actually happens when the MLDBM file is tied to a hash? My guess is that there is some kind of virtual memory scheme in the background, because reading the whole file into memory would not be sensible. It would be nice to know for sure.

Edited by Chady -- code tags.

Replies are listed 'Best First'.
Re: MLDBM and Memory
by tilly (Archbishop) on Jun 01, 2004 at 05:06 UTC
    There are 2 layers here to understand. The first is MLDBM, and the second is GDBM_File.

    What MLDBM does is take care of stringification and de-stringification of complex data structures. That is you read back a key/value pair, and then MLDBM turns that value to/from a nested Perl data structure. You could do the same thing that MLDBM does if you tied straight to GDBM_File and then used Storable to freeze/thaw your data after reading/before storing. If your datastructures for a single key/value get very large, this could be a problem. If they are not large, it isn't. If you're concerned by that, then you could look at DBM::Deep.

    The real question is GDBM_File. That is a Perl interface between Perl and the GNU dbm implementation. That is the internal implementation that decides how much memory to read in, use, etc. You'll have to read the gdbm man-page to find out how it uses shared memory, etc. Odds are good that it uses some multi-level caching system, and caches in RAM. But I don't know anything about the size of those caches, how to tune it, etc. I don't know anything about that, but I'd guess that its memory use is relatively limited.

    Another thing to try is that you could run it and watch memory usage with standard tools (like top and ps). That does not give a definitive answer, but it gives you a good idea of what the usage looks like.
Re: MLDBM and Memory
by graff (Chancellor) on Jun 01, 2004 at 05:26 UTC
    With about 10K keys, this might not be a killer issue, but you might notice a difference:
    foreach my $key (keys %hash_db) # creates a list in memory # containing all key strings # vs. while ( my ($key, $val) = each %hash_db ) # iterates over the DB entries one at a time # and doesn't hold all keys in memory at once

      True. Sometimes we need to sort the keys, but using foreach without good reason is poor programming.