The core of the first regexp looks like this:
( [^']* (stuff)? )*
Because 'stuff' is optional, this becomes an exponential search: /( [^']* )*/x can match n characters 2n ways (as long as none is a quote character).*
The alternate regexp looks like this:
and now, because 'stuff' is not optional, the search is no longer exponential.[^']* ( stuff [^']* )*
This is the cause of the slowdown with the first version in your non-perl code. The same slowdown doesn't happen in perl because the regexp engine has some special tricksy handling for this situation, and if it sees something looking exponential it turns on some caching behaviour to avoid unnecessary repetition. (If you use re qw/debug/; you'll see this:
in the output when it happens.)Detected a super-linear match, switching on caching...
While I am not currently aware of bugs in that area, I kinda expect that there are some, and I suspect that's what is biting you here; the fact that the second regexp does not suffer the same problem is strong evidence for this. (I can confirm that the behaviour persists in the latest bleadperl.)
I'd suggest a) sending a perlbug about it, and b) trying real hard to remove exponential behaviour from regexps when you can.
Hugo
*Update: to be more strictly correct, /( [^']+ )*/x can match 2n ways, while /( [^']* )*/x can match infintely many ways. But the regexp engine automatically suppresses irrelevant multiple zero-length matches, so the effect is much the same.
In reply to Re: This regex seems to have splattered non-greedy everywhere
by hv
in thread This regex seems to have splattered non-greedy everywhere
by fizbin
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |