Clear questions and runnable code get the best and fastest answer |
|
PerlMonks |
Memory use/leak with large number of (?{}) patterns in regexby vr (Curate) |
on Nov 24, 2019 at 13:09 UTC ( [id://11109143]=perlquestion: print w/replies, xml ) | Need Help?? |
vr has asked for the wisdom of the Perl Monks concerning the following question: I wanted to match strings containing many fixed-point low-precision numerals, with "text"/whitespace in between. Because of nature of input source, their comparison should be numerically tolerant/fuzzy. I ended with solution which involves programmatically generated regular expressions with (?(?{...})(*F)) per number, i.e. there are many of these in a regex. With some input, Perl started to segfault and "Out of memory!" on me. Upon investigation, this input happens to have a degenerate case with many thousands of numerals for a single sub-string, for which a regex was created. This sub-string should have been probably excluded from processing in the first place, but I was curios what's going on. Here is SSCE redacted to quite useless no-op:
I'm testing with 64-bit Perl and Linux and 8 Gb RAM. With LEN => 10_000, Perl eats ~1 Gb of memory, and apparently sits on it/doesn't free it when it needs more. With 20_000, it's already ~4.5 Gb, and + 1 Gb upon scalar creation (memory is not freed even after re-engine was reset?). With 20_000 and (*) line un-commented, Perl segfaults after 13 stars; it doesn't appear to have consumed all available RAM. With 30_000 and (*) line commented back, it's "panic: memory wrap at (eval 6) line 155489. Attempt to free unreferenced scalar: SV 0x56258a3af7b0, Perl interpreter: 0x562584d66260 at (eval 6) line 155489." after 22 stars. Arguably, regex with 10_000 of (?{}) is stupid, but I wonder if it indicates slow leak in case of "normal" number of this pattern and long-running process.
Back to
Seekers of Perl Wisdom
|
|