Memory consumption

pachkov has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,

I need your wisdom.

In my script I read some data to the hash (hash of arrays). It is quite big and takes around 1 GB of memory. Then I start reading another file line by line and if some data match to the hash key it is printed out.

!!!! As soon as I start printing things memory consumption grows like crazy resulting in "out of memory" error. Total amount of memory on the working machine is 4Gb.

How to reduce memory usage?

*** Solved! See my comment underneath. ***

My script looks like that:

#####################
my %hash = get_hash();

open(IN, "$in");
open(OUT1, "> $out1");
open(OUT2, "> $out2");
open(OUT3 ,"> $out3");

while (<IN>) {
    my @data = split /\s+/, $_;

    if (defined($hash{$data[0]})) {
        print OUT1 "$data[0]\n";
        print OUT2 join("\t", @data[1..$#data]) ."\n";
        print OUT3 join("\t", @{$hash{$data[0]}}) ."\n";
    }
}
[download]

Thank you in advance!

Best,

Mike

Comment on Memory consumption Download Code

Replies are listed 'Best First'.
Re: Memory consumption by BrowserUk (Patriarch) on May 06, 2009 at 09:57 UTC
This statement is duplicating the entire hash: `my %hash = get_hash();` [download] Instead of: `get_hash { my %hash; ## populate %hash; ... return %hash; } ... my %hash = get_hash(); ... if( defined( $hash{ $data[0] } ) ) {` [download] Use: `get_hash { my %hash; ## populate %hash; ... return \%hash; } ... my $hashRef = get_hash(); ... if( defined( $hashRef->{ $data[0] } ) ) { ## Note the arrow .......^^` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l] [select]
Re^2: Memory consumption by pachkov (Novice) on May 06, 2009 at 10:01 UTC
In fact I do like you have written. Posted code is simplified. But the problem was not in populating hash but in printing out some values. Anyway I have found solution and feel great!	[reply]
Re^3: Memory consumption by ELISHEVA (Prior) on May 06, 2009 at 10:50 UTC
In general, on this forum I think you will find that simplifying code in that way (e.g. converting references into hashes or arrays) will only confuse us and make us do extra work (like the code BrowserUK posted). This is especially true when there are performance or memory issues. There are plenty of people here who are quite comfortable with complex data structures. Best, beth	[reply]
Re: Memory consumption by pachkov (Novice) on May 06, 2009 at 09:54 UTC
Posting here was quite stimulating for thinking!!! Solution is simple. Instead of printing dereferenced array from hash I first assign it to local variable which is printed. `# before print OUT3 join("\t", @{$hash{$data[0]}}) ."\n"; #now my @data1 = @{$hash{$data[0]}}; print OUT3 join("\t", @data1) ."\n";` [download] I can only guess that before it was assigning some memory to anonymous array for dereferenced hash array and not releasing that memory for new cycle. Any comments from perl memory management guru are very welcome!	[reply] [d/l]
Re^2: Memory consumption by chromatic (Archbishop) on May 07, 2009 at 00:02 UTC
I can only guess that before it was assigning some memory to anonymous array for dereferenced hash array and not releasing that memory for new cycle. That's very unlikely, or else millions of Perl programs would show the same symptoms. We'd have to see more of your program to give better suggestions.	[reply]
Re: Memory consumption by Anonymous Monk on May 06, 2009 at 09:44 UTC
Refactor part that eats memory `my %hash = get_hash();`, maybe with MLDBM (or DB_File), or DBD::SQLite... maybe just flatten so its `simple => "hash \tof\tvalues"`	[reply] [d/l] [select]
Re^2: Memory consumption by DrHyde (Prior) on May 06, 2009 at 10:25 UTC
DBM::Deep is a better choice than MLDBM these days, as access to arbitrarily deeply buried data is far more transparent. To immediately address the OP's concerns, it is far more memory efficient than MLDBM too!	[reply]
Re^3: Memory consumption by Anonymous Monk on May 06, 2009 at 11:08 UTC
Thank you, I couldn't remember the name and I was looking for it in DBD-	[reply]