techtruth has asked for the wisdom of the Perl Monks concerning the following question:
Hi PerlMonks,
I have a question about Perl and the way it handles RAM with the regex binding operator (=~).
Preface: I am running Debian Squeeze with 1024mb of RAM; My perl version is 5.10.1; I have chosen not to upgrade my RAM;
Note: I think that it is important to mention that the string I wish to do a global match for is in the area of 50mb. :(
Problem: I am attempting to return an array from a global match on a very large string (my @array = $somebigstring =~ /regex/g) but it eats up (in my understanding) far more memory than it should.
My (failing) Solutions:
I have used undef() on both $somebigstring and @array, which frees a an amount of memory more or less equal to their size. This still leaves nearly all of my RAM taken up by some unknown data.
Writing the data to a file, then processing it line by line
My Thoughts: I believe that the binding operator may set some small variables for each match. These small variable's collective size becomes very large if many matches are made, as in a global match.
Are my assumptions correct, and if so is there a nice way to tell perl to release or not store that "extra" data? I am aware that perl has a memory "pool" that does not release memory to the OS until the script is over.
I will provide a code snippet below, and would like any suggestions on how to handle the memory hogging.
print "\tProcessing data...\n"; foreach (@links) { my $link = $_; # Get data from the internet my $httpReply = $browser->get($link); ####### This is my problem, the regex match eats up a lot of RAM. my @data = $httpReply->content =~ /regex/g; undef($httpReply); #Frees memory round the size of the webpage. #undef(@data); #Frees memory around the size of the array. ####### ####### If the RAM is nearly full, I am unable to completely store + the values in a hash. # Add each extracted data to the hash while(scalar @data) { my $line = shift(@data); my ($field1, $field2) = split(':', lc($line)); # Sort the emails into a hash of arrays. key = domain; value = + [username, reference] push(@{$emailHash{$field2}} , [$field1, $link]); } }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Binding operator RAM eater
by johngg (Canon) on Apr 17, 2012 at 22:29 UTC | |
by GrandFather (Saint) on Apr 17, 2012 at 22:51 UTC | |
|
Re: Binding operator RAM eater
by RichardK (Parson) on Apr 17, 2012 at 21:48 UTC | |
|
Re: Binding operator RAM eater
by Anonymous Monk on Apr 17, 2012 at 21:52 UTC | |
|
Re: Binding operator RAM eater
by bulk88 (Priest) on Apr 17, 2012 at 23:42 UTC | |
by chromatic (Archbishop) on Apr 17, 2012 at 23:45 UTC | |
by techtruth (Novice) on Apr 19, 2012 at 05:31 UTC |