in reply to Re^2: Matching a regular expression group multiple times
in thread Matching a regular expression group multiple times

Why is the repeat failing? Is it because the non-greediness of the inner term somehow trumps the greediness of the outer?

Yes, this is due to the way the regex engine works. Perl will match the literal string "simple", and then match any number of characters, but as few as possible (.*?), subject to the constraints imposed by the rest of the pattern. But there IS no rest of the pattern; so there are no constraints, and Perl does its utmost and matches zero extra characters.

Only now after this is done does the + quantifier kick in, but since it finds that there isn't another literal "simple" following what was already matched, nothing further is matched, and the entire match consists of only of the initial "simple" followed by the empty string that the .*? matched.

Wait, I hear you say, there is more to the pattern! The + itself surely follows? However, that's not how the regex engine works; the + is part of the pattern currently being matched, and the fact that it trails the non-capturing group is a mere artifact of Perl's regex syntax. It helps to think of the + as being at the front of that group instead, where you'd also find other modifiers (e.g. (?i:...)).

So there is no pattern following the first, and Perl isn't cunning enough to match a bigger part of the string. Neither should it be: in order to do so, it'd have to ignore what you're explicitely telling it to (match any number of characters, but as few as possible), so in order to be able to match more later on. And how would it know that this is what you wanted, anyway? Perl is a DWIMmy language, but it can't read minds yet. ;)

The regex engine's inner workings are explained in detail in chapter 5 of Programming Perl, BTW, in the section titled "The Little Engine That /Could(n't)?/".

  • Comment on Re^3: Matching a regular expression group multiple times