Well I was going to post to say you should really be checking using a negated character class, rather than having all that backtracking going on. I was pretty sure it'd be faster, and it's what I would normally do when coding regexes like this.

I did a quick benchmark first, and it turns out I was wrong, the negated character class get relatively more and more inefficient the longer the data it has to scoop up is. Twice as much as proved here.

use strict; use Benchmark qw(:all) ; my $count = 50000; my $replacement_string = "this is a (" . "a"x1000 . ") test"; cmpthese($count, { 'negated' => sub { my $text = $replacement_string; $text =~ s|\([^)]*\)||sg; }, 'backtrack' => sub { my $text = $replacement_string; $text =~ s|\(.*?\)||sg; }, }); OUTPUT Rate negated backtrack negated 8562/s -- -67% backtrack 26316/s 207% --
I still think there's something to be said for the character class, as it is more explicit (after all, we are trying to match anything other than the closing bracket.), but it it certainly slower.

This surprised me, so I thought I'd post it, incase it surprised anyone else.


In reply to Re^2: strip out anything inbetween brackets by reasonablekeith
in thread strip out anything inbetween brackets by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.