Jman1 has asked for the wisdom of the Perl Monks concerning the following question:

As a followup to my previous question, I essentially have to create a huge array of arrays of arrays of arrays, but the program dies apparently because it runs out of memory. (It gives no error message or warning, just stops.) This is a simplification of the problem:

use strict; use warnings; my %hash = (); foreach my $a (1..500) { foreach my $b (1..4000) { foreach my $c (1..10) { foreach my $d (1..7) { $hash{$a}{$b}{$c}{$d} = $d; } } } } print "here\n";

That program dies without reaching the print statement. I'm running Windows XP.

What's my best solution here? Obviously I could write everything to disk somehow, but that seems like it would be pretty slow, especially since I'll need to access it a lot to do computations. Do I need a database? Help! :)

Replies are listed 'Best First'.
Re: My data structure needs too much memory!
by Roy Johnson (Monsignor) on Dec 13, 2005 at 16:55 UTC
    Using arrays instead of hashes will save you maybe an order of magnitude of memory, if the indexes are really numbers in the ranges you show in your example. I don't know whether that would be enough.

    How are you really using the data structure?


    Caution: Contents may have been coded under pressure.
      That sample code produces the same result (i.e. none) when the data structure is a 4-dim array.

      Basically I'm using the data structure to run comparisons of two lists of bags. For each item in list a, for each item in list b, there are about 7 bags of words. I need to compare all the bags of each item in list a to all the bags of each item in list b, using operations like "intersect."

        That data structure is going to take up about 11.7 Gigabytes of memory. And that's with the degenerate case you give where each HoHoH has a single digit integer as its data. (Reduce the outer loop to 10 so it will fit into memory, put in a busy/wait loop so you have time to look up the memory usage before it exits, then look in the process manager to get the memory usage.)

        use strict; use warnings; my %hash = (); foreach my $a (1..10) { foreach my $b (1..4000) { foreach my $c (1..10) { foreach my $d (1..7) { $hash{$a}{$b}{$c}{$d} = $d; } } } print "$a\n"; } print "here\n"; next while (!<STDIN>);

        I come up with about 247000KB used. Take out about 2K for the perl interpreter overhead, multiply by 50, divide by 10242... 11.7 GB.

        Even converting to arrays, it takes up about 5 Gigabytes.

        It would seem that either you are going to have to rework your algorithm or use a database. I'm not clear what exactly you are attempting to accomplish.

        That sounds like a database job to me.

        Caution: Contents may have been coded under pressure.
        I need to compare all the bags of each item in list a to all the bags of each item in list b, using operations like "intersect."

        Somehow, that doesn't seem to explain why you have four nested dimensions in a hash structure (or in an array structure, which would seem more appropriate based on what you've told us).

        Maybe you can decompose the problem so that you use hashes or arrays of just two dimensions? Just a thought...

Re: My data structure needs too much memory!
by pileofrogs (Priest) on Dec 13, 2005 at 17:12 UTC

    It seems to me that you should be able to come up with a mathematical expression or function that lets you do something like f($a,$b,$c) = $d.

    I guess I'm saying, are you sure you need to do all that calculation ahead of time? If you do it at the time of your comparison, it'll probably take a lot more CPU (a LOOOOT more CPU), but then at least it won't bust your RAM.

Re: My data structure needs too much memory!
by dave_the_m (Monsignor) on Dec 13, 2005 at 18:48 UTC
    $hash{$a}{$b}{$c}{$d} = $d;
    If you really do just need to store small numbers (eg in the range 0..7) using only small contiguous numeric indicies, then you can store the number with minimum memory using vec(), at the cost of slower access:
    $hash{$a}{$b}{$c}{$d} = $d; # before vec($data, $a*4000*10*8 + $b*10*8 + $c*8 + $d, 8) = $d; # after
    That would use approx 60 Mb rather than the several Gb before. My example vec line assumes you use indexes from 0 rather than 1.
Re: My data structure needs too much memory!
by artist (Parson) on Dec 13, 2005 at 17:25 UTC
    If we know the more about the problem, we may be able to come up with a solution of the real problem, rather than given data structure.
    --Artist