You see, there's only one way it could match, but there are plenty of ways it can fail (as in "almost succeed"). And a regex engine has a tendency to try them all, it's apparently not yet smart enough to know it can't succeed. Looking at the regex
and the string "around", there's plenty of ways this could possibly match:/(?:[^"]+|"")*/
Each phrase between parens indicates a group as matched by the subpattern [^"]+, and all groups are matched as a whole by * — as you can see, there are many ways it can be split up.(around) (aroun)(d) (arou)(nd) (arou)(n)(d) ...
If you use the "cut" operator, there's only one way this can match:
So it makes sense to expect that using the cut operator might yield a significant speed boost.(around)
But, without further ado, here's the benchmark, as run with perl 5.8.8:
Results (neither the figures, nor their ratios), are not always the same, so take with a grain of salt:my $s = q("You spin me ""around"" and ""around"", ""round"", like a re +cord, ""round"" and ""around"".); use Benchmark 'cmpthese'; cmpthese(-3, { cut => sub { return $s =~ /"(?>[^"]+|"")*"(?!")/ }, straight => sub { return $s =~ /"(?:[^"]+|"")*"(?!")/ } });
That's a factor of 9, that /(?>[^"]+|"")*/ is faster than /(?:[^"]+|"")*/, for this string, and that is definitely not insignificant.Rate straight cut straight 7579/s -- -89% cut 69921/s 823% --
In reply to Re^4: Help with Double Double Quotes regular expression (imprecise)
by bart
in thread Help with Double Double Quotes regular expression
by mattford63
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |