The core of the first regexp looks like this:

( [^']* (stuff)? )*

Because 'stuff' is optional, this becomes an exponential search: /( [^']* )*/x can match n characters 2n ways (as long as none is a quote character).*

The alternate regexp looks like this:

[^']* ( stuff [^']* )*
and now, because 'stuff' is not optional, the search is no longer exponential.

This is the cause of the slowdown with the first version in your non-perl code. The same slowdown doesn't happen in perl because the regexp engine has some special tricksy handling for this situation, and if it sees something looking exponential it turns on some caching behaviour to avoid unnecessary repetition. (If you use re qw/debug/; you'll see this:

Detected a super-linear match, switching on caching...
in the output when it happens.)

While I am not currently aware of bugs in that area, I kinda expect that there are some, and I suspect that's what is biting you here; the fact that the second regexp does not suffer the same problem is strong evidence for this. (I can confirm that the behaviour persists in the latest bleadperl.)

I'd suggest a) sending a perlbug about it, and b) trying real hard to remove exponential behaviour from regexps when you can.

Hugo

*Update: to be more strictly correct, /( [^']+ )*/x can match 2n ways, while /( [^']* )*/x can match infintely many ways. But the regexp engine automatically suppresses irrelevant multiple zero-length matches, so the effect is much the same.


In reply to Re: This regex seems to have splattered non-greedy everywhere by hv
in thread This regex seems to have splattered non-greedy everywhere by fizbin

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.