Out of memory

sandy105 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

Re: Out of memory
by Ratazong (Monsignor) on Jun 09, 2015 at 12:30 UTC

You don't need to load the whole file into memory, but just as many lines as there are in your search string. So you can use the following approach:

read n lines
do pattern match
remove the first line
read one more line
goto 2

[reply]

Re^2: Out of memory

by Athanasius (Archbishop) on Jun 09, 2015 at 15:49 UTC

++Ratazong (when the Vote Fairy next visits) for this sliding window solution. But I have two quibbles:

Say the search string is 230 characters long. Then since each input line is 80 characters, the search string is 3 lines long (because 3 x 80 = 240 is the smallest multiple of 80 to be >= 230). So n is 3. But the pattern may begin near the end of an input line and stretch over 4 lines. So the minimum size of the sliding window is n + 1 (320 characters for the example search string).
Setting the window size to n + 1 lines will produce the smallest memory footprint. But it will also entail a large amount of processing, much of it duplicated, as the regex engine searches over and over within the same overlapping text. If the window size is, say, ten times the minimum (i.e., 3200 characters for the 230 character search string), only 3 of the ten lines need be duplicated in each subsequent window — already a significant saving in processing time. Determining an optimum window size — one which successfully balances memory usage against processing time — will depend on the OP’s requirements and available memory, and will likely require some trial-and-error. But I expect the savings in processing time will more than compensate for the time spent in optimising the window size.

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]

Re^3: Out of memory

by sandy105 (Scribe) on Jun 15, 2015 at 11:54 UTC

.Those are very valid points , thanks much @Athanasius

[reply]

Re^2: Out of memory

by sandy105 (Scribe) on Jun 15, 2015 at 11:52 UTC

sorry for the late reply , although its at random positions that the search string appears it can be spanning no more than 3 lines , so yes probably i can start doing what you suggested . Thanks !

[reply]

Re: Out of memory
by Corion (Patriarch) on Jun 09, 2015 at 12:28 UTC

There is no way to specifically tune the memory used by Perl.

Consult with your system administrator about existing ulimits.

Also, consider whether your Perl is built for a 32-bit architecture. If so, it can't use more than 3 GB of memory in total.

Consider whether you really, really need to slurp the whole file into memory to do your change.

[reply]
[d/l]

Re: Out of memory
by BrowserUk (Patriarch) on Jun 09, 2015 at 12:37 UTC

You'll double the size of file you can load by using this:

my $data; do {local $/; $data = <FILE> };
[download]

Instead of:

my $data = do {local $/; <FILE> };
[download]

And probably, though I haven't tested it recently, faster by doing:

my $size = -s( $filename );
open my $fh, '<', $filename or die $!;
read( $fh, my $data, $size );
[download]

But if you're being limited to a few hundred MB, on a system with many GB available; Corion's right. Either your process is a 32-bit Perl; or it is being subject to a ulimit; or both.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority". I'm with torvalds on this

In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

[reply]
[d/l]
[select]

Re^2: Out of memory

by Anonymous Monk on Jun 09, 2015 at 17:05 UTC

You'll double the size of file you can load

[reply]

Re: Out of memory
by Discipulus (Canon) on Jun 09, 2015 at 19:59 UTC

multiline regex

m

There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

[reply]
[d/l]
[select]

Re: Out of memory
by karlgoethebier (Abbot) on Jun 10, 2015 at 11:43 UTC

Re^4: Possible to have regexes act on file directly (not in memory)?

Regards, Karl

ŤThe Crux of the Biscuit is the Apostropheť

[reply]

Re: Out of memory
by karlgoethebier (Abbot) on Jun 11, 2015 at 13:48 UTC

"...no way of knowing if & where my search string will be split at new line "\n"...i try to load the whole file into a string and then do a pattern match"

I'm unsure if i understood your specs right but perhaps you want to do something like this:

use strict;
use warnings;
use feature qw(say);

my $file    = q(big.txt);
my $size    = -s( $file);
my $pattern = qr((y+)(?:\n*)(y*));

open my $fh, '<', $file or die $!;
read( $fh, my $data, $size );
close $fh;

while( $data =~ m/$pattern/g ) {
  say qq($1$2);
}

__END__

# data like this: 
xyyyyyxxxx
xxxxxxxxxx
xxxxxxxxyy
yyyxxxxxxx
xxxxxxxxxx
xxxxxxyyyy
yxxxxxxxxx
xxxxxxxxxx
xxxxxyyyyy
xxxxxxxxxx
xxxxxxxxxx
xxxxxxxxxx
# small version ;-)

Desktop\monks>1129652.pl
yyyyy
yyyyy
yyyyy
yyyyy
[download]

The example runs on a 350MB file on XP 32bit with 2 GByte RAM in ~1s

Just an idea.

Regards, Karl

P.S.: If i use File::Map i get Out of Memory!

ŤThe Crux of the Biscuit is the Apostropheť

[reply]
[d/l]
[select]

Re^2: Out of memory

by sandy105 (Scribe) on Jun 15, 2015 at 11:56 UTC

I will try this and report back , thanks

[reply]

Re: Out of memory
by locked_user sundialsvc4 (Abbot) on Jun 09, 2015 at 14:05 UTC

If you know that you are looking for a pattern that will not stretch over more than 2 (or, any n) consecutive records, then you can simply build an array of the first n records in the file (to “prime the pump” ...), then proceed as follows:

Concatenate all n records in the current list, into a single string. Then, search within that string.
unshift the first record from the head of the array, and push the next record onto the tail of it.
Rinse and repeat, until an undef indicates that you have reached the end of the file.

No matter how enormous the file being processed may be, the memory requirements are negligible.