in reply to Re: strip out anything inbetween brackets
in thread strip out anything inbetween brackets

Well I was going to post to say you should really be checking using a negated character class, rather than having all that backtracking going on. I was pretty sure it'd be faster, and it's what I would normally do when coding regexes like this.

I did a quick benchmark first, and it turns out I was wrong, the negated character class get relatively more and more inefficient the longer the data it has to scoop up is. Twice as much as proved here.

use strict; use Benchmark qw(:all) ; my $count = 50000; my $replacement_string = "this is a (" . "a"x1000 . ") test"; cmpthese($count, { 'negated' => sub { my $text = $replacement_string; $text =~ s|\([^)]*\)||sg; }, 'backtrack' => sub { my $text = $replacement_string; $text =~ s|\(.*?\)||sg; }, }); OUTPUT Rate negated backtrack negated 8562/s -- -67% backtrack 26316/s 207% --
I still think there's something to be said for the character class, as it is more explicit (after all, we are trying to match anything other than the closing bracket.), but it it certainly slower.

This surprised me, so I thought I'd post it, incase it surprised anyone else.