Quick conjecture time (like usual)...

The greedy operators are optimized. They figure out what characters could occur directly after their match. So in the case of /^.*"/, the .* part knows that it will end right before a " and so does the equivalent of a rindex() to find the last " in the string. Then it lets the rest of the regex try to match. If that fails, then it backtracks to the previous ".

I always assumed that the non-greedy operators were optimized the same way. But based on your benchmarks, I've changed my mind. It looks like, in the case of /^.*?"/, that the .*? part doesn't compute what it might be followed by and just starts out matching nothing and letting the rest of the regex try to match. If this doesn't work, it forwardtracks to the next character. It should probably instead do the equivalent of index() at first and then forwardtrack to the next ", in this case.

Sorry, I don't have time to dig into the regex engine code right now. This would probably be an "easy" patch except for the fact that even easy patches to the regex engine require 12th-level dieties.


In reply to Re: How are we lazy? by tye
in thread How are we lazy? by Ovid

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.