in reply to Avoiding regex backtracking

You are correct that the pattern .*?foo can be written as (?:[^f]+|f(?!oo))*foo, or the variant you showed. You are correct in assuming that this general principle can be used for practically any regex.

However, you should "unroll the loop" of that regex. Look at these results:

use Benchmark 'cmpthese'; my $str = "this is the best end"; cmpthese(-5, { extra => sub { $str =~ /(?:[^e]+|e(?!nd))*end/ }, plain => sub { $str =~ /.*?end/ }, }); __END__ extra: 12019.37/s (n=60818) plain: 49615.59/s (n=283305) Rate extra plain extra 12019/s -- -76% plain 49616/s 313% --
If we unroll the loop, we get better results.
use Benchmark 'cmpthese'; my $str = "this is the best end"; cmpthese(-5, { extra => sub { $str =~ /(?:[^e]+|e(?!nd))*end/ }, plain => sub { $str =~ /.*?end/ }, uroll => sub { $str =~ /[^e]*(?:e(?!nd)[^e]*)*end/ }, }); __END__ extra: 11711.62/s (n=61486) plain: 49329.33/s (n=250593) uroll: 17985.61/s (n=94964) Rate extra uroll plain extra 11712/s -- -35% -76% uroll 17986/s 54% -- -64% plain 49329/s 321% 174% --
But the point is, the internal optimization uses less regex op-codes, and so it is faster -- it has the engine do the work itself, it doesn't make the regex do the work.

_____________________________________________________
Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a (from-home) job
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Replies are listed 'Best First'.
Re^2: Avoiding regex backtracking
by Aristotle (Chancellor) on Mar 06, 2002 at 00:40 UTC

    I did in fact expect the dot-star-free version to eat up a lot of its "savings" due to the much higher complexity. I was also pretty sure that the regex engine would be using some special opimization for this case as it's so easy to spot. I just really should get into the habit of Benchmarking..

    Cheers everyone for very helpful replies.

    But basically that leaves me at square one: so when should I avoid dot star at least as a rule of thumb? What kind of construct should make me wary at first sight, what could be a red flag (as in the Perl.com program repair shop series) to look for? I am beginning to see a pattern (pun intended) I think - all examples I can recall off the top of my head use of several dot stars, causing excessive backtracking. Does that sound right?

    Makeshifts last the longest.