kevyt has asked for the wisdom of the Perl Monks concerning the following question:

Hi Everyone. The code I have below might attempt to store up to a million keys on any given day and it runs out of memory. How can I store a huge amount of data while using a minimal amount of memory? It reads several files with the following format:

some_name 3 19
A_name 9 2
while(<IN>){ @tmp = split; $key = shift @tmp; $cache{$key}[0] += $tmp[0]; $cache{$key}[1] += $tmp[1]; } close IN; foreach (keys %cache){ print OUT $_," ",join( " ", @{$cache{$_}}), "\n"; } close OUT;

Replies are listed 'Best First'.
Re: How can I make a hash use less memory
by ignatz (Vicar) on Oct 11, 2002 at 17:59 UTC
    Try a database.

    The premise that you need to store millions of items in memory is flawed. No amount of optimization is going to save you when you need such a huge amount of memory.

    ()-()
     \"/
      `                                                     
    

      And to further expand on ignatz, you can switch to something like DB_File (which is in core perl) or BerkeleyDB and not have to change your code much. You keep your hash - it's just tied to a database. I've used BerkeleyDB to good effect for working with large datasets that didn't fit into memory.

      Update: you'll probably also want to use something to store your structures as well: Storeable, MLDBM, FreezeThaw, Data::Dumper, etc. You can't just store raw structures but the idea is the same.

      __SIG__ printf "You are here %08x\n", unpack "L!", unpack "P4", pack "L!", B:: +svref_2object(sub{})->OUTSIDE
Re: How can I make a hash use less memory
by talexb (Chancellor) on Oct 11, 2002 at 17:57 UTC

    You don't say which part runs out of memory, the load or the display.

    In any case, it appears that you can divide and conquer .. try running the script on smaller chunks, then add the intermediate results together to get a final result.

    --t. alex
    but my friends call me T.
      Thanks Alex, I was thinking about smaller chunks or maybe using 3 arrays. I parse all of this stuff and then it is loaded into a database.
Re: How can I make a hash use less memory
by bigj (Monk) on Oct 11, 2002 at 18:20 UTC
    One method to save a bit of memory could be to avoid to store the data in a hash of arrays. A simple Hash could reduce the necessary data. I'm imaginating a pseudo code like:
    while (<IN>) { my ($key, $x, $y) = split; my @from_cache = split ' ', $cache{$key}; $from_cache[0] += $x; $from_cache[1] += $y; $cache{$key} = join ' ', @from_cache; }
    But of course, to use a database is the longterm right answer, as my solution is only a dirty hack :-)
      thanks to everyone. I will give this a try.
Re: How can I make a hash use less memory
by Elian (Parson) on Oct 11, 2002 at 18:43 UTC
    Definitely use a database, as a number of folks have already pointed out. If you go all the way and use a relational DB, you can incrementally load into it as new data becomes available, then use whatever query tools you want (or write) to work on the bits of the database you need to. It's the best way once things get reasonably big, and you're well past 'reasonable' here. :)

    If you're curious as to the current size of things, you can always play with the Devel::Size module, which'll figure out how much memory your hash is using right now.

Re: How can I make a hash use less memory
by hossman (Prior) on Oct 12, 2002 at 05:55 UTC

    why bother trying to store "a huge amount of data" in memory, when all you are doing is processing it linearly?

    Sort your huge data file (by the key) first, and then process it with something like this...

    #!/usr/local/bin/perl use warnings; use strict; my ($key, $first, $second) = split ' ', <>; while (<>) { my @line = split; if ($key eq $line[0]) { $first += $line[1]; $second += $line[2]; } else { print "$key $first $second\n"; ($key, $first, $second) = @line; } } print "$key $first $second\n";

    (something like "sort your/big/ass/data/file.txt | monk.pl")

Re: How can I make a hash use less memory
by BrowserUk (Patriarch) on Oct 11, 2002 at 19:24 UTC

    perlfaq3 offers this advice concerning using less memory.

    How can I make my Perl program take less memory? In some cases, using substr() or vec() to simulate arrays can be highl +y beneficial. For example, an array of a thousand booleans will take +at least 20,000 bytes of space, but it can be turned into one 125-byt +e bit vector--a considerable memory savings. The standard Tie::Substr +Hash module can also help for certain types of data structure. If you +'re working with specialist data structures (matrices, for instance) +modules that implement these in C may use less memory than equivalent + Perl modules.

    I did some playing around with Tie::SubstrHash a while ago, but my results were inconclusive for my purposes, but depending upon your data, it might prove more frugal than using a standard hash.


    Cor! Like yer ring! ... HALO dammit! ... 'Ave it yer way! Hal-lo, Mister la-de-da. ... Like yer ring!
Re: How can I make a hash use less memory
by MZSanford (Curate) on Oct 11, 2002 at 19:37 UTC

    I was going to suggest CB_File, but i see diotalevi already said that.

    <silly>
    I did notice that the following was missing :

    use less 'memory';
    it is even in the manual.

    </silly>
    from the frivolous to the serious