pachkov has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,

I need your wisdom.

In my script I read some data to the hash (hash of arrays). It is quite big and takes around 1 GB of memory. Then I start reading another file line by line and if some data match to the hash key it is printed out.

!!!! As soon as I start printing things memory consumption grows like crazy resulting in "out of memory" error. Total amount of memory on the working machine is 4Gb.

How to reduce memory usage?

*** Solved! See my comment underneath. ***

My script looks like that:

##################### my %hash = get_hash(); open(IN, "$in"); open(OUT1, "> $out1"); open(OUT2, "> $out2"); open(OUT3 ,"> $out3"); while (<IN>) { my @data = split /\s+/, $_; if (defined($hash{$data[0]})) { print OUT1 "$data[0]\n"; print OUT2 join("\t", @data[1..$#data]) ."\n"; print OUT3 join("\t", @{$hash{$data[0]}}) ."\n"; } }

Thank you in advance!

Best,

Mike

Replies are listed 'Best First'.
Re: Memory consumption
by BrowserUk (Patriarch) on May 06, 2009 at 09:57 UTC

    This statement is duplicating the entire hash:

    my %hash = get_hash();

    Instead of:

    get_hash { my %hash; ## populate %hash; ... return %hash; } ... my %hash = get_hash(); ... if( defined( $hash{ $data[0] } ) ) {

    Use:

    get_hash { my %hash; ## populate %hash; ... return \%hash; } ... my $hashRef = get_hash(); ... if( defined( $hashRef->{ $data[0] } ) ) { ## Note the arrow .......^^

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      In fact I do like you have written. Posted code is simplified. But the problem was not in populating hash but in printing out some values. Anyway I have found solution and feel great!

        In general, on this forum I think you will find that simplifying code in that way (e.g. converting references into hashes or arrays) will only confuse us and make us do extra work (like the code BrowserUK posted). This is especially true when there are performance or memory issues. There are plenty of people here who are quite comfortable with complex data structures.

        Best, beth

Re: Memory consumption
by pachkov (Novice) on May 06, 2009 at 09:54 UTC

    Posting here was quite stimulating for thinking!!!

    Solution is simple. Instead of printing dereferenced array from hash I first assign it to local variable which is printed.

    # before print OUT3 join("\t", @{$hash{$data[0]}}) ."\n"; #now my @data1 = @{$hash{$data[0]}}; print OUT3 join("\t", @data1) ."\n";

    I can only guess that before it was assigning some memory to anonymous array for dereferenced hash array and not releasing that memory for new cycle.

    Any comments from perl memory management guru are very welcome!

      I can only guess that before it was assigning some memory to anonymous array for dereferenced hash array and not releasing that memory for new cycle.

      That's very unlikely, or else millions of Perl programs would show the same symptoms. We'd have to see more of your program to give better suggestions.

Re: Memory consumption
by Anonymous Monk on May 06, 2009 at 09:44 UTC
    Refactor part that eats memory my %hash = get_hash();, maybe with MLDBM (or DB_File), or DBD::SQLite... maybe just flatten so its simple => "hash \tof\tvalues"
      DBM::Deep is a better choice than MLDBM these days, as access to arbitrarily deeply buried data is far more transparent. To immediately address the OP's concerns, it is far more memory efficient than MLDBM too!
        Thank you, I couldn't remember the name and I was looking for it in DBD-