sandy105 has asked for the wisdom of the Perl Monks concerning the following question:

I usually use a while<filehandle> loop to run through the file line-by-line and do my work . But for a a new script I had to do a global search on the input file because it has a unique 80 chars per line file and I have to no way of knowing if & where my search string will be split at new line "\n" or other control characters. Now when i try to load the whole file into a string and then do a pattern match by

my $data = do {local $/; <FILE> }; #do pattern match

This gives out of memory error as expected for even file few hundred MB's in size

My question here is can we modify the memory allocated to perl kind of like we can configure JVM memory allocation(I m primarily a j2ee programmer). More so because we run these script on enterprise servers with lots of memory

Replies are listed 'Best First'.
Re: Out of memory
by Ratazong (Monsignor) on Jun 09, 2015 at 12:30 UTC

    You don't need to load the whole file into memory, but just as many lines as there are in your search string. So you can use the following approach:

    1. read n lines
    2. do pattern match
    3. remove the first line
    4. read one more line
    5. goto 2
    HTH, Rata

      ++Ratazong (when the Vote Fairy next visits) for this sliding window solution. But I have two quibbles:

      1. Say the search string is 230 characters long. Then since each input line is 80 characters, the search string is 3 lines long (because 3 x 80 = 240 is the smallest multiple of 80 to be >= 230). So n is 3. But the pattern may begin near the end of an input line and stretch over 4 lines. So the minimum size of the sliding window is n + 1 (320 characters for the example search string).

      2. Setting the window size to n + 1 lines will produce the smallest memory footprint. But it will also entail a large amount of processing, much of it duplicated, as the regex engine searches over and over within the same overlapping text. If the window size is, say, ten times the minimum (i.e., 3200 characters for the 230 character search string), only 3 of the ten lines need be duplicated in each subsequent window — already a significant saving in processing time. Determining an optimum window size — one which successfully balances memory usage against processing time — will depend on the OP’s requirements and available memory, and will likely require some trial-and-error. But I expect the savings in processing time will more than compensate for the time spent in optimising the window size.

      Hope that helps,

      Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

        .Those are very valid points , thanks much @Athanasius

      sorry for the late reply , although its at random positions that the search string appears it can be spanning no more than 3 lines , so yes probably i can start doing what you suggested . Thanks !

Re: Out of memory
by Corion (Patriarch) on Jun 09, 2015 at 12:28 UTC

    There is no way to specifically tune the memory used by Perl.

    Consult with your system administrator about existing ulimits.

    Also, consider whether your Perl is built for a 32-bit architecture. If so, it can't use more than 3 GB of memory in total.

    Consider whether you really, really need to slurp the whole file into memory to do your change.

Re: Out of memory
by BrowserUk (Patriarch) on Jun 09, 2015 at 12:37 UTC

    You'll double the size of file you can load by using this:

    my $data; do {local $/; $data = <FILE> };

    Instead of:

    my $data = do {local $/; <FILE> };

    And probably, though I haven't tested it recently, faster by doing:

    my $size = -s( $filename ); open my $fh, '<', $filename or die $!; read( $fh, my $data, $size );

    But if you're being limited to a few hundred MB, on a system with many GB available; Corion's right. Either your process is a 32-bit Perl; or it is being subject to a ulimit; or both.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

      You'll double the size of file you can load
      This was probably an issue before 5.20 and should no longer matter with recent versions.

Re: Out of memory
by Discipulus (Canon) on Jun 09, 2015 at 19:59 UTC
    mmh sounds very strange with few hundrends Mb. Anyway you can use multiline regex with the m modifier
    against a pair of lines at time:1-2,2-3,3-4.. at least to debug why you go out of memory. As you said you are not principally a Perl programmer: are you using strict and warnings? In fact i humbly but strongly suspect the problem is not the size, of the file.

    L*
    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: Out of memory
by karlgoethebier (Abbot) on Jun 10, 2015 at 11:43 UTC
Re: Out of memory
by karlgoethebier (Abbot) on Jun 11, 2015 at 13:48 UTC
    "...no way of knowing if & where my search string will be split at new line "\n"...i try to load the whole file into a string and then do a pattern match"

    I'm unsure if i understood your specs right but perhaps you want to do something like this:

    use strict; use warnings; use feature qw(say); my $file = q(big.txt); my $size = -s( $file); my $pattern = qr((y+)(?:\n*)(y*)); open my $fh, '<', $file or die $!; read( $fh, my $data, $size ); close $fh; while( $data =~ m/$pattern/g ) { say qq($1$2); } __END__ # data like this: xyyyyyxxxx xxxxxxxxxx xxxxxxxxyy yyyxxxxxxx xxxxxxxxxx xxxxxxyyyy yxxxxxxxxx xxxxxxxxxx xxxxxyyyyy xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx # small version ;-) Desktop\monks>1129652.pl yyyyy yyyyy yyyyy yyyyy

    The example runs on a 350MB file on XP 32bit with 2 GByte RAM in ~1s

    Just an idea.

    Regards, Karl

    P.S.: If i use File::Map i get Out of Memory!

    «The Crux of the Biscuit is the Apostrophe»

      I will try this and report back , thanks

Re: Out of memory
by locked_user sundialsvc4 (Abbot) on Jun 09, 2015 at 14:05 UTC

    If you know that you are looking for a pattern that will not stretch over more than 2 (or, any n) consecutive records, then you can simply build an array of the first n records in the file (to “prime the pump” ...), then proceed as follows:

    1. Concatenate all n records in the current list, into a single string.   Then, search within that string.
    2. unshift the first record from the head of the array, and push the next record onto the tail of it.
    3. Rinse and repeat, until an undef indicates that you have reached the end of the file.

    No matter how enormous the file being processed may be, the memory requirements are negligible.