in reply to Regex Minimal Quantifiers

BTW, this looks like a copy & paste of some code from perlre.

When you add a use re 'debug'; to your code (or use re Debug => 'EXECUTE';), it outputs this:

Matching REx "(.*?)(\d*)" against "I have 2 numbers: 53147" 0 <> <I have 2 n> | 0| 1:OPEN1(3) 0 <> <I have 2 n> | 0| 3:MINMOD(4) 0 <> <I have 2 n> | 0| 4:STAR(6) 0 <> <I have 2 n> | 1| 6:CLOSE1(8) 0 <> <I have 2 n> | 1| 8:OPEN2(10) 0 <> <I have 2 n> | 1| 10:STAR(12) | 1| POSIXU[\d] can match 0 times out +of 2147483647... 0 <> <I have 2 n> | 2| 12:CLOSE2(14) 0 <> <I have 2 n> | 2| 14:END(0) Match successful! (.*?)(\d*) <> <>

Since the non-greedy modifier ? causes .* to match the minimum number of times possible, which is zero, followed by zero or more digits (\d*), the regex succeeds having matched zero characters.

Replies are listed 'Best First'.
Re^2: Regex Minimal Quantifiers
by pr33 (Scribe) on May 24, 2017 at 15:23 UTC

    Thanks .

    All I wanted to know if the (.*?) matches 0 characters at the start of the string . I am aware that \d* matches 0 or more digits .

    In this case , Both the captures match 0 characters at the start of the string and return nothing .

    In the case of (.*?)(\d+) , I got confused how (.*?) matches 'I have ' instead of an empty string

      In the case of (.*?)(\d+) , I got confused how (.*?) matches 'I have ' instead of an empty string

      The reason is that the regex engine always works from left to right, which is why the .*? begins matching at the beginning of the string - again, use re 'debug'; to see it in action. If you wanted to capture only the digit(s), you could write your regex as /(\d+)/, which has to skip everything that's not a digit at the beginning of the string for it to begin matching.

      Update: If by "if the (.*?) matches 0 characters at the start of the string" you are referring to the /(.*?)(\d+)/ regex, then note that the overall regex still has to match, so .*? doesn't match zero characters in this case, as the regex engine again works to left from right and the .*? consumes the "I have " in order for the \d+ to match the "2".

        Thanks for the explanation . Using re helps understand much better .