Hi PerlMonks,

I have a question about Perl and the way it handles RAM with the regex binding operator (=~).

Preface: I am running Debian Squeeze with 1024mb of RAM; My perl version is 5.10.1; I have chosen not to upgrade my RAM;

Note: I think that it is important to mention that the string I wish to do a global match for is in the area of 50mb. :(

Problem: I am attempting to return an array from a global match on a very large string (my @array = $somebigstring =~ /regex/g) but it eats up (in my understanding) far more memory than it should.

My (failing) Solutions: I have used undef() on both $somebigstring and @array, which frees a an amount of memory more or less equal to their size. This still leaves nearly all of my RAM taken up by some unknown data.
Writing the data to a file, then processing it line by line

My Thoughts: I believe that the binding operator may set some small variables for each match. These small variable's collective size becomes very large if many matches are made, as in a global match.

Are my assumptions correct, and if so is there a nice way to tell perl to release or not store that "extra" data? I am aware that perl has a memory "pool" that does not release memory to the OS until the script is over.

I will provide a code snippet below, and would like any suggestions on how to handle the memory hogging.

print "\tProcessing data...\n"; foreach (@links) { my $link = $_; # Get data from the internet my $httpReply = $browser->get($link); ####### This is my problem, the regex match eats up a lot of RAM. my @data = $httpReply->content =~ /regex/g; undef($httpReply); #Frees memory round the size of the webpage. #undef(@data); #Frees memory around the size of the array. ####### ####### If the RAM is nearly full, I am unable to completely store + the values in a hash. # Add each extracted data to the hash while(scalar @data) { my $line = shift(@data); my ($field1, $field2) = split(':', lc($line)); # Sort the emails into a hash of arrays. key = domain; value = + [username, reference] push(@{$emailHash{$field2}} , [$field1, $link]); } }

In reply to Binding operator RAM eater by techtruth

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.