I wanted to match strings containing many fixed-point low-precision numerals, with "text"/whitespace in between. Because of nature of input source, their comparison should be numerically tolerant/fuzzy. I ended with solution which involves programmatically generated regular expressions with (?(?{...})(*F)) per number, i.e. there are many of these in a regex.

With some input, Perl started to segfault and "Out of memory!" on me. Upon investigation, this input happens to have a degenerate case with many thousands of numerals for a single sub-string, for which a regex was created. This sub-string should have been probably excluded from processing in the first place, but I was curios what's going on. Here is SSCE redacted to quite useless no-op:

use strict; use warnings; use re 'eval'; STDOUT-> autoflush( 1 ); use constant LEN => 10_000; my $n = 0; my $s = '1' x LEN;; my $r = '(1)(?{ # $^N ? 1 : 1; # (*) does use of $^N exacerbate? print "*" and select # to better watch undef, undef, undef, 0.25 # with htop unless ++ $n % 1000 # })' x LEN; print "\nMatch\n" if $s =~ /^$r$/; print "1) Hit Enter"; <>; ( $s = '' ) =~ //; # reset everything # about $s and re-engine (?) $s = ( int rand 10 ) x 1e9; # allocate another Gb print "2) Hit Enter"; <>;

I'm testing with 64-bit Perl and Linux and 8 Gb RAM. With LEN => 10_000, Perl eats ~1 Gb of memory, and apparently sits on it/doesn't free it when it needs more. With 20_000, it's already ~4.5 Gb, and + 1 Gb upon scalar creation (memory is not freed even after re-engine was reset?). With 20_000 and (*) line un-commented, Perl segfaults after 13 stars; it doesn't appear to have consumed all available RAM. With 30_000 and (*) line commented back, it's "panic: memory wrap at (eval 6) line 155489. Attempt to free unreferenced scalar: SV 0x56258a3af7b0, Perl interpreter: 0x562584d66260 at (eval 6) line 155489." after 22 stars.

Arguably, regex with 10_000 of (?{}) is stupid, but I wonder if it indicates slow leak in case of "normal" number of this pattern and long-running process.


In reply to Memory use/leak with large number of (?{}) patterns in regex by vr

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.