in reply to Re^12: Memory leak question
in thread Memory leak question
I believe you are being bitten by regex engine leaks.
Here's what I discovered.
my %cache; sub _iso8601_rx { my($self,$rx) = @_; my $dmt = $$self{'tz'}; my $dmb = $$dmt{'base'}; return $cache{ $rx } if exists $cache{ $rx }; } $cache{cdate} = '(?<y>\d\d\d\d)-(?<m>\d\d)-(?<d>\d\d)'; $cache{ctime} = '(?<h>\d\d):(?<mn>\d\d):(?<s>\d\d)'; $cache{fulldate} = "$cache{cdate}\\s+$cache{ctime}"; 1;
my %cache; sub _iso8601_rx { my($self,$rx) = @_; my $dmt = $$self{'tz'}; my $dmb = $$dmt{'base'}; return $cache{ $rx } if exists $cache{ $rx }; } $cache{cdate} = <<'ERX'; (?i-xsm:(?:(?<y>\d\d\d\d)(?<m>\d\d)(?<d>\d\d)|(?<y>\d\d\d\d)\-(?<m>\d\ +d)\-(?<d>\d\d)|\-(?<y>\d\d)(?<m>\d\d)(?<d>\d\d)|\-(?<y>\d\d)\-(?<m>\d +\d)\-(?<d>\d\d)|\-?(?<y>\d\d)(?<m>\d\d)(?<d>\d\d)|\-?(?<y>\d\d)\-(?<m +>\d\d)\-(?<d>\d\d)|\-\-(?<m>\d\d)\-?(?<d>\d\d)|\-\-\-(?<d>\d\d)|(?<y> +\d\d\d\d)\-?(?<doy>\d\d\d)|\-?(?<y>\d\d)\-?(?<doy>\d\d\d)|\-(?<doy>\d +\d\d)|(?<y>\d\d\d\d)W(?<w>\d\d)(?<dow>\d)|(?<y>\d\d\d\d)\-W(?<w>\d\d) +\-(?<dow>\d)|\-?(?<y>\d\d)W(?<w>\d\d)(?<dow>\d)|\-?(?<y>\d\d)\-W(?<w> +\d\d)\-(?<dow>\d)|\-?(?<yod>\d)W(?<w>\d\d)(?<dow>\d)|\-?(?<yod>\d)\-W +(?<w>\d\d)\-(?<dow>\d)|\-W(?<w>\d\d)\-?(?<dow>\d)|\-W\-(?<dow>\d)|\-\ +-\-(?<dow>\d))) ERX $cache{ctime} = <<'ERX'; (?-xism:(?:(?<h>[0-1][0-9]|2[0-3])(?<mn>[0-5][0-9])(?<s>[0-5][0-9])(?: +[\.,]\d*)?|(?<h>[0-1][0-9]|2[0-3]):(?<mn>[0-5][0-9]):(?<s> ... bulk of the regex ellided because PM won;t let me post that much! +... azt|ret|mot|gyt|lrt|ut|e|a|u|k|o|d|z|t|n|p|y|g|w|s|c|i|m|b|q|v|r|x|h|f +|l)) \))? ))))?) ERX $cache{fulldate} = <<'ERX'; (?x-ism:^\s*(?: (?i-xsm:(?:(?<y>\d\d\d\d)(?<m>\d\d)(?<d>\d\d)|(?<y>\d\ +d\d\d)\-(?<m>\d\d)\-(?<d>\d\d)|\-(?<y>\d\d)(?<m>\d\d)(?<d>\d\d)|\-(?< +y>\d\d)\-(?<m>\d\d)\-(?<d>\d\d)|\-?(?<y>\d\d)(?<m>\d\d)(?<d>\d\d)|\-? +(?<y>\d\d)\-(?<m>\d\d)\-(?<d>\d\d)|\-\-(?<m>\d ... bulk of the regex ellided because PM won't let me post that much i +n a single post! ... nmt|lkt|gst|vet|tjt|eat|ept|cat|pht|pwt|nft|set|gft|hst|nut|qmt|mpt|tr +t|ywt|cdt|emt|met|ast|net|kst|ect|brt|bdt|mvt|cst|cvt|fmt|azt|ret|mot +|gyt|lrt|ut|e|a|u|k|o|d|z|t|n|p|y|g|w|s|c|i|m|b|q|v|r|x|h|f|l)) \))? +))))?) | (?-xism:(?:(?<h>[0-1][0-9]|2[0-3])|\-(?<mn>[ +0-5][0-9]))) )\s*$) ERX 1;
I thought that it was maybe the use of (so many) named captures, but I tried very hard to make them leak. A single regex with 175,000 named captures; matching /g against a string that contained 10,000 matches for them; in a (v.slow) loop. It grew very arge, but once it maxed out, it didn't leak at all.
So then I remembered that I'd seen the regex trie optimisation caused problems with large alternations, but disabling it didn't change things.
Then I thought to try your monster regexes in a standalone script and run them directly on the sample date in a loop:
#! perl use strict; my %cache = ( ctime => <<'RXA', cdtate => <<'RXB', fulldate -> <<'RXC' + ); ##... monster regex initialisation ellided; my $refull = qr[$cache{ fulldate }]x; my $rectime = qr[$cache{ ctime }]x; my $recdate = qr[$cache{ cdate }]x; for (1..100e6) { "2010-02-01 01:02:03" =~ $refull; "2010-02-01 01:02:03" =~ $rectime; "2010-02-01 01:02:03" =~ $recdate; }
it doesn't leak at all. Not a jot.
So, it's not just the monster regexes, but also how they're are being used, or the results are being used that triggers the leak.
I'm kinda stuck for a direction in which to go now, but I hope that this will help you zero in on the cause. I'll keep looking.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^14: Memory leak question
by SBECK (Chaplain) on Oct 06, 2010 at 15:16 UTC | |
by SBECK (Chaplain) on Oct 06, 2010 at 16:51 UTC | |
by SBECK (Chaplain) on Oct 06, 2010 at 19:06 UTC | |
by BrowserUk (Patriarch) on Oct 06, 2010 at 19:40 UTC | |
by SBECK (Chaplain) on Oct 07, 2010 at 11:41 UTC | |
by BrowserUk (Patriarch) on Oct 06, 2010 at 20:20 UTC | |
by SBECK (Chaplain) on Oct 07, 2010 at 11:24 UTC | |
by BrowserUk (Patriarch) on Oct 07, 2010 at 11:43 UTC |