in reply to Re^12: Memory leak question
in thread Memory leak question
I believe you are being bitten by regex engine leaks.
Here's what I discovered.
my %cache; sub _iso8601_rx { my($self,$rx) = @_; my $dmt = $$self{'tz'}; my $dmb = $$dmt{'base'}; return $cache{ $rx } if exists $cache{ $rx }; } $cache{cdate} = '(?<y>\d\d\d\d)-(?<m>\d\d)-(?<d>\d\d)'; $cache{ctime} = '(?<h>\d\d):(?<mn>\d\d):(?<s>\d\d)'; $cache{fulldate} = "$cache{cdate}\\s+$cache{ctime}"; 1;
I thought that it was maybe the use of (so many) named captures, but I tried very hard to make them leak. A single regex with 175,000 named captures; matching /g against a string that contained 10,000 matches for them; in a (v.slow) loop. It grew very arge, but once it maxed out, it didn't leak at all.
So then I remembered that I'd seen the regex trie optimisation caused problems with large alternations, but disabling it didn't change things.
Then I thought to try your monster regexes in a standalone script and run them directly on the sample date in a loop:
#! perl use strict; my %cache = ( ctime => <<'RXA', cdtate => <<'RXB', fulldate -> <<'RXC' + ); ##... monster regex initialisation ellided; my $refull = qr[$cache{ fulldate }]x; my $rectime = qr[$cache{ ctime }]x; my $recdate = qr[$cache{ cdate }]x; for (1..100e6) { "2010-02-01 01:02:03" =~ $refull; "2010-02-01 01:02:03" =~ $rectime; "2010-02-01 01:02:03" =~ $recdate; }
it doesn't leak at all. Not a jot.
So, it's not just the monster regexes, but also how they're are being used, or the results are being used that triggers the leak.
I'm kinda stuck for a direction in which to go now, but I hope that this will help you zero in on the cause. I'll keep looking.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^14: Memory leak question
by SBECK (Chaplain) on Oct 06, 2010 at 15:16 UTC | |
by SBECK (Chaplain) on Oct 06, 2010 at 16:51 UTC | |
by SBECK (Chaplain) on Oct 06, 2010 at 19:06 UTC | |
by BrowserUk (Patriarch) on Oct 06, 2010 at 19:40 UTC | |
by SBECK (Chaplain) on Oct 07, 2010 at 11:41 UTC | |
by BrowserUk (Patriarch) on Oct 06, 2010 at 20:20 UTC | |
by SBECK (Chaplain) on Oct 07, 2010 at 11:24 UTC | |
by BrowserUk (Patriarch) on Oct 07, 2010 at 11:43 UTC |