dmn has asked for the wisdom of the Perl Monks concerning the following question:

O holiest monks of perl,

I am processing a substantially large amount of data which contains things containing nested {{}} that are completely irrevelant to me. All I need to do is find them and remove them, only I need to do this several billion times on arbitrarily large (did I mention large?) strings. The following works flawlessly:

my (@array) = $text=~m/( \{\{ (?: [^\{\}]* | (?1) )* \}\} )/xg; $text=~s/\Q$_\E/ / foreach @array;

...up until a certain point at which, I imagine, the machine runs out of memory since it is memoizing this regex. The problem with that is that the things in {{}} occur so infrequently that memoization is actually hurting here, not helping.

Is there conceivably a more efficient way to do this? Please tell me there is and I've been going about Googling for it the wrong way. :)

TIA, DMN

Replies are listed 'Best First'.
Re: Efficient way to match and replace nested braces (etc.)
by eyepopslikeamosquito (Archbishop) on Apr 12, 2012 at 21:15 UTC
Re: Efficient way to match and replace nested braces (etc.)
by dmn (Initiate) on Apr 12, 2012 at 19:39 UTC
    In a way I hate it when I post a question then manage to answer it myself 10 minutes later. :) I managed to a.) read Programming Perl (LOL), and b.) cut out the recursion:
    while ($text=~m/(( \{+) .*? (??{ '\}' x length $2 }))/xg) { $text=~s/\Q$1\E/ /; }

    Of course, suggestions for improvements are more than welcome. :)

    - DMN
Re: Efficient way to match and replace nested braces (etc.)
by Anonymous Monk on Apr 12, 2012 at 21:14 UTC
    Another way of doing this is to build a regex with a finit number of nested levels:
    my $text = 'this {{}}should be {{test {{ab{{c}}d}} zzzz}}good {{bw}}'; my $nested = 100; my $re = '{{(?:[^{}]+|'; $re .= $re x ($nested); $re .= ')*}}' x ($nested + 1); $text =~ s/$re//sg; print "$text\n"; __END__|OUTPUT: this should be good
      Even better:
      my $re = '{{(?:[^{}]+|'; $re .= $re x ($nested) . "\b" . ')*}}' x ($nested + 1);
Re: Efficient way to match and replace nested braces (etc.)
by Anonymous Monk on Apr 13, 2012 at 12:57 UTC
    use Regexp::Common qw /balanced/; my $re = qr{$RE{balanced}{-begin => "{{"}{-end => "}}"}}; $text =~ s/$re//g;