pr33 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

Can any one please explain me the below code ? Isn't the minimal quantifier(.*?) in this case be supposed to match 'I have ' for the first capture and then from number 2 till end of the string for second capture $2 . The O/p returns nothing for $1 and $2 .

Here is my code

#!/usr/bin/perl use warnings; use strict; ############### my $str = "I have 2 numbers: 53147"; my @pats = qw { (.*?)(\d*) }; foreach my $pat (@pats) { printf "%-12s ", $pat; if ( $str =~ /$pat/ ) { print "<$1> <$2>\n"; } else { print "FAIL\n"; } }

$ ./regex.pl

(.*?)(\d*) <> <>

Replies are listed 'Best First'.
Re: Regex Minimal Quantifiers
by huck (Prior) on May 24, 2017 at 04:54 UTC

    \d* also matches 0 digits

    my $str = "I have 2 numbers: 53147"; if ( $str =~ /(.*?)(\d+)/ ) { print "<$1> <$2>\n"; } else { print "FAIL\n"; }
    result
    <I have > <2>

      Thanks Huck

Re: Regex Minimal Quantifiers
by haukex (Archbishop) on May 24, 2017 at 06:00 UTC

    BTW, this looks like a copy & paste of some code from perlre.

    When you add a use re 'debug'; to your code (or use re Debug => 'EXECUTE';), it outputs this:

    Matching REx "(.*?)(\d*)" against "I have 2 numbers: 53147" 0 <> <I have 2 n> | 0| 1:OPEN1(3) 0 <> <I have 2 n> | 0| 3:MINMOD(4) 0 <> <I have 2 n> | 0| 4:STAR(6) 0 <> <I have 2 n> | 1| 6:CLOSE1(8) 0 <> <I have 2 n> | 1| 8:OPEN2(10) 0 <> <I have 2 n> | 1| 10:STAR(12) | 1| POSIXU[\d] can match 0 times out +of 2147483647... 0 <> <I have 2 n> | 2| 12:CLOSE2(14) 0 <> <I have 2 n> | 2| 14:END(0) Match successful! (.*?)(\d*) <> <>

    Since the non-greedy modifier ? causes .* to match the minimum number of times possible, which is zero, followed by zero or more digits (\d*), the regex succeeds having matched zero characters.

      Thanks .

      All I wanted to know if the (.*?) matches 0 characters at the start of the string . I am aware that \d* matches 0 or more digits .

      In this case , Both the captures match 0 characters at the start of the string and return nothing .

      In the case of (.*?)(\d+) , I got confused how (.*?) matches 'I have ' instead of an empty string

        In the case of (.*?)(\d+) , I got confused how (.*?) matches 'I have ' instead of an empty string

        The reason is that the regex engine always works from left to right, which is why the .*? begins matching at the beginning of the string - again, use re 'debug'; to see it in action. If you wanted to capture only the digit(s), you could write your regex as /(\d+)/, which has to skip everything that's not a digit at the beginning of the string for it to begin matching.

        Update: If by "if the (.*?) matches 0 characters at the start of the string" you are referring to the /(.*?)(\d+)/ regex, then note that the overall regex still has to match, so .*? doesn't match zero characters in this case, as the regex engine again works to left from right and the .*? consumes the "I have " in order for the \d+ to match the "2".

Re: Regex Minimal Quantifiers
by hippo (Archbishop) on May 24, 2017 at 08:05 UTC
    and then from number 2 till end of the string for second capture $2

    No, that's not what (\d*) means. If you wanted that you should use (\d.*) instead. The asterisk means any number of the preceeding match type.