mr.nick has asked for the wisdom of the Perl Monks concerning the following question:

Okay, I would like a little help here. I'm sorting through some data and want to store it in a hash of hashes ($hash{key}{subkey}=$value). No biggie. The problem is that the data exceeds 800MB in size when fully loaded.

So I considered using tie and related functions to store the hash on disk. Of course, you can't have two dimensional hashes via tie. But since my first key is sparse (around 20 entries), I thought I could create a tied hash just for that key, so that the subkey would be on disk.

My attempt looked something like this:
sub checkdbm { my $hashref=shift; my $key=shift; if (-f "$key-dbm.db") { return; } tie (%{$hashref->{$key}},'NDBM_File',"$key-dbm",O_RDWR|O_CREAT,0640) +; }
with the invocation of:
checkdbm \%terms,$db; for my $t (@terms) { $terms{$db}{$t}++; }
But alas, I can't get it to work. The code executes, the key-dbm.db files are created, but no data is ever stored in it; the program grows in memory until even switching virtual consoles takes minutes to perform.

Can anyone see what my problem is? (Ahem, my *programming* problem). I suspect I don't understand precisely how tie works and that is biting me in my butt.

TIA!

Replies are listed 'Best First'.
Re: Tie, DBM's, HoH and Sparse Keys
by runrig (Abbot) on Apr 10, 2001 at 22:47 UTC
    try creating a tied hash in the subroutine, then set the hashref passed in to that hash:
    sub { my ($hashref, $key) = @_; my %hash; tie %hash, ....; $hashref->{$key} = \%hash; }
    Actually, though, I'd think about using a database, starting with maybe DBD::RAM, then on to a indexable database.
Re: Tie, DBM's, HoH and Sparse Keys
by Asim (Hermit) on Apr 10, 2001 at 22:52 UTC
    If I'm reading you right, I'd consider something like, say, Storable to hold your data.

    ----Asim, known to some as Woodrow.

      Yeah, that would be better. But I'm still curious, why didn't my method work?
Re: Tie, DBM's, HoH and Sparse Keys
by jeroenes (Priest) on Apr 11, 2001 at 13:43 UTC
    Didn't you mean to write:
    checkdbm \%terms,$db; for my $t (keys %{$terms->{$db}}) { $terms{$db}{$t}++; }
    ?

    Jeroen
    "We are not alone"(FZ)

Re: Tie, DBM's, HoH and Sparse Keys
by TheoPetersen (Priest) on Apr 11, 2001 at 00:13 UTC
    I tried this (Perl 5.6.1, RedHat 6.0) and found something wierd.
    checkdbm \%terms,$db; print "hashref =", tied($terms{$db}), "\n";
    prints "hashref =" and the use of uninitialized value warning. So the tiedness of the hash reference is getting lost somewhere.
      Yeah, that didn't work quite as expected, so I followed runrigs suggestion and changed it to:
      sub checkdbm { my $hashref=shift; my $key=shift; if (-f "$key-dbm.db") { return; } my %hash; tie (%hash,'NDBM_File',"$key-dbm",O_RDWR|O_CREAT,0640); $hashref->{$key}=\%hash; }
      which works .. sorta. The DB files fill with the correct information, but my memory usage isn't any less. Odd, no?

      Well, off to Storable I suppose (and I was really trying to get THIS to work ... for stubborness sake).

        If speed is not of the essence...

        Could you try untying the hashes in between operations on particular top-level keys? That should release the NDBM_File buffers and what-not.