MatthewFrancis has asked for the wisdom of the Perl Monks concerning the following question:

Greetings Monks,

I have need to work with large, complex data structures (~ 100,000 rows, hashes of hashes etc.), and don't want to overload our Solaris Unix memory. I've heard about the Perl DBM, MLDBM etc, and thought these would be the answer to my problems. However, when i attempted to use the MLDBM (chosen because, per literature, it allows storage of complex data structures, direct modifications etc.) I found I couldn't use hash-like functions such as exists. Is there another commonly used/distributed DBM utility that allows this sort of thing? What would be considered the "industry standard" for this type of activity these days?

Thank you,
MatthewFrancis

Replies are listed 'Best First'.
Re: Perl DBM
by mirod (Canon) on Apr 13, 2004 at 15:07 UTC

    If your data can be modelled into tables a relational database, especially a lightweight one like SQLite, using DBI with DBD::SQLite, would probably be a good fit.

      Thanks for the reply...

      The quick answer is that we don't have access to a relational database for this project, unfortunately. However, thanks for the SQLite recommendation - I'd not heard of it before.

      Thanks again,
      MatthewFrancis
        we don't have access to a relational database for this project

        If you can install Perl modules than you can install SQLite. It comes bundled with DBD::SQLite. It is a simple C library, and the DB is a single file. Which means no daemon running, no port to open, no admin to do (access rights are the access rights to the file). From an admin point of view it is just like a flat file. Using it is (or at least should ;--) really be a low-level implementation decision.

      Fitting arbitrary complex data into a relational data may be hard. Very hard. Using MLDBM or similar (DBM::Deep looks interesting) is much easier. To the original questioner: are you using DB_File or GDBM_File as the database backend? The default SDBM_File has many drawbacks (limited record size, missing EXISTS is implemented only since 5.6.1).
        Thanks for the reply,

        I'm using things as follows:
        use SDBM_File; use MLDBM qw ( SDBM_File );
        Again, I'm a total newbie to DBM issues, so if there are better ways to to this I'd be grateful for some pointers.
        Thanks!!
        MatthewFrancis
Re: Perl DBM
by perrin (Chancellor) on Apr 13, 2004 at 17:05 UTC
    Actually exists() should work fine with all dbm files. They function like hashes. Can you give an example of the problem you're having?
      Hi,
      Thanks for the response. The following works w/ the tie-statement commented out (as below):
      2 use SDBM_File; 3 use MLDBM qw (SDBM_File); 4 use Fcntl; 5 6 #tie (%h, 'MLDBM', 'dbm_file', O_CREAT|O_RDWR, 0666) || die "c +ouldnt open tie-file\n"; 7 8 $h{ "1" } = { "a" => "first" }; 9 $h{ "2" } = { "a" => "second" }; 10 $h{ "3" } = { "a" => "third" }; 11 12 foreach ( keys %h ) 13 { 14 $value = ${ $h{ $_ } }{ "a" }; 15 print "key: .$_. value: .$value.\n"; 16 }; 17 18 if ( exists $h{ "1" } ) 19 { 20 $h{ "1" } = [ "a", "b", "c" ]; 21 22 foreach ( @{ $h{ "1" } } ) 23 { 24 print "value is: .$_.\n"; 25 }; 26 };
      I get:
      key: .1. value: .first. key: .2. value: .second. key: .3. value: .third. value is: .a. value is: .b. value is: .c.
      but, if I uncomment the tie, I get:
      key: .1. value: .first. key: .2. value: .second. key: .3. value: .third. SDBM_File doesn't define an EXISTS method at dbm.pl line 18
      Any help appreciated. I'm using MLDBM because, per literature I encountered, regular DBM doesn't permit complex data-structures.
      Thanks again -
      MatthewFrancis
        You could just use defined() in this case, but exists() does work on most dbm implementations. Instead of SDBM_File, try DB_File, GDBM_File, or NDBM_File.
Re: Perl DBM
by jZed (Prior) on Apr 13, 2004 at 15:31 UTC
    I second the comment that you'll be best off in the long run with DBI and that DBD::SQLite is a good choice if you aren't ready for a full powered database system. If you can get by with something even liter or if you already have data in DBM formats, then <shameless plug> try my new DBD::DBM. It handles all formats of DBM, is faster than SQLite for many simple operations., and is as lightweight as they come - just install DBI version 1.42 or higher - DBD::DBM comes with it.

    DBD::DBM can work with BerkeleyDB and MLDBM and both of those are also handy on their own. See also the pod for AnyDBM_File for a brief comparison of different DBM types.