Why is the repeat failing? Is it because the non-greediness of the inner term somehow trumps the greediness of the outer?

Yes, this is due to the way the regex engine works. Perl will match the literal string "simple", and then match any number of characters, but as few as possible (.*?), subject to the constraints imposed by the rest of the pattern. But there IS no rest of the pattern; so there are no constraints, and Perl does its utmost and matches zero extra characters.

Only now after this is done does the + quantifier kick in, but since it finds that there isn't another literal "simple" following what was already matched, nothing further is matched, and the entire match consists of only of the initial "simple" followed by the empty string that the .*? matched.

Wait, I hear you say, there is more to the pattern! The + itself surely follows? However, that's not how the regex engine works; the + is part of the pattern currently being matched, and the fact that it trails the non-capturing group is a mere artifact of Perl's regex syntax. It helps to think of the + as being at the front of that group instead, where you'd also find other modifiers (e.g. (?i:...)).

So there is no pattern following the first, and Perl isn't cunning enough to match a bigger part of the string. Neither should it be: in order to do so, it'd have to ignore what you're explicitely telling it to (match any number of characters, but as few as possible), so in order to be able to match more later on. And how would it know that this is what you wanted, anyway? Perl is a DWIMmy language, but it can't read minds yet. ;)

The regex engine's inner workings are explained in detail in chapter 5 of Programming Perl, BTW, in the section titled "The Little Engine That /Could(n't)?/".


In reply to Re^3: Matching a regular expression group multiple times by AppleFritter
in thread Matching a regular expression group multiple times by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.