It is possible to modify the code to get an Out of Memory error on MSYS2 5.36 and Strawberry Perl 5.20 and 5.38. This does not occur with a perlbrewed 5.36 on Ubuntu via WSL, nor Strawberry Perl 5.18.

Collating an array of lines from the in-memory file handle is sufficient. Commenting out the regex in the in-memory file handle loop makes the OOM go away, as does modifying the string before adding it to the array.

(I also modified the code to use a separate variable for the in-memory file handle. It has no effect but is arguably cleaner.)

#!/usr/bin/env perl use warnings; use strict; use Time::HiRes qw( time ); use Devel::Peek; my $file = shift @ARGV; my ($fh, $time); my (@arr1, @arr2); my $use_dump = 0; if (!$file) { # should use Path::Tiny::tempfile $file = 'tempfile.txt'; open my $ofh, '>', $file or die "Cannot open $file for writing, $! +"; srand(1234567); for my $i (0..200000) { my $string = 'some random text ' . rand(); $string = $string x (1 + int (rand() * 10)); if (rand() < 0.163) { $string = " Query${string}"; } say {$ofh} $string; } $ofh->close or die "Cannot close $file, $!"; printf "%s is size %i Mb\n", $file, (-s $file) / (1028**2); } open $fh, "<", $file; my $s = do {local $/ = undef; <$fh>}; seek $fh, 0, 0; print "\n\n"; $time = time; my $match_count1 = 0; my $i1 = 0; my $xx; while(<$fh>) { /^ ?Query/ && $match_count1 ++; push @arr1, $_; if ($use_dump and /^ Query/) { Dump $_; $i1 ++; last if $i1 > 5; } } printf "%f read lines from disk and do RE ($match_count1 matches).\n", + time - $time; $fh->close; open my $mfh, "<", \$s; $time = time; my $match_count2 = 0; my $i2 = 0; while(<$mfh>) { # comment this out to avoid the OOM /^ ?Query/ && $match_count2++; #push @arr2, ($_ . ""); # avoids OOM push @arr2, $_; # OOM! if ($use_dump and /^ Query/) { Dump $_; $i2++; last if $i2 > 5; } } printf "%f read lines from in-memory file and do RE ($match_count2 mat +ches).\n", time - $time; $mfh->close;

In reply to Re: RE on lines read from in-memory scalar is very slow (OOM variant) by swl
in thread RE on lines read from in-memory scalar is very slow by Danny

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.