Danny has asked for the wisdom of the Perl Monks concerning the following question:

In the following, '(.*)' consumes the rest of the string as expected:
$_ = "a b=c"; /(a)(.*)(?:b=(\w))?/; printf "\$2 is [%s]\n", defined $2 ? $2 : "undefined"; printf "\$3 is [%s]\n", defined $3 ? $3 : "undefined";
prints
$2 is [ b=c] $3 is [undefined]
I thought that using '(.*?)' would match up to 'b' then allow the optional '(?:b=(\w))?' to match, but this doesn't seem to be the case. Since they are both optional I guess it doesn't work. How can I consume everything up to a potential (?:b=(\w))? and still get (?:b=(\w))? to match?
$_ = "a b=c"; /(a)(.*?)(?:b=(\w))?/; printf "\$2 is [%s]\n", defined $2 ? $2 : "undefined"; printf "\$3 is [%s]\n", defined $3 ? $3 : "undefined";
prints
$2 is [] $3 is [undefined]

Replies are listed 'Best First'.
Re: RE greediness
by ikegami (Patriarch) on May 29, 2024 at 19:35 UTC

    I thought that using '(.*?)' would match up to 'b'

    Why would it do that when it can match zero times?

    1. (a) matches 1 character at position 0.
    2. (.*?) attempts to match . 0 times at position 1.
    3. (.*?) matches 0 character at at position 1.
    4. (?:b=(\w))? attempts to match b=(\w) 1 times at position 1.
    5. b=(\w) fails to match at position 1.
    6. (?:b=(\w))? attempts to match b=(\w) 0 times at position 1.
    7. (?:b=(\w))? matches 0 characters at at position 1.
    8. Successful match.
      Makes sense!

      EDIT:
        Why would it do that when it can match zero times?
      I guess I was thinking that since (?:b=(\w))? prefers to match once it would influence the behavior of .*? in an analogous way to how (?:b=(\w)) would, which, as you explained, isn't the case.

Re: RE greediness
by tybalt89 (Monsignor) on May 29, 2024 at 18:56 UTC
    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11159717 use warnings; $_ = "a b=c"; /(a)(.*?)(?|b=(\w)|()\z)/ or die;; printf "\$2 is [%s]\n", defined $2 ? $2 : "undefined"; printf "\$3 is [%s]\n", defined $3 ? $3 : "undefined"; $_ = "a c=c"; /(a)(.*?)(?|b=(\w)|()\z)/ or die;; printf "\$2 is [%s]\n", defined $2 ? $2 : "undefined"; printf "\$3 is [%s]\n", defined $3 ? $3 : "undefined";

    Outputs:

    $2 is [ ] $3 is [c] $2 is [ c=c] $3 is []
      Nice! I like your use of ?|. I similarly ended up using an OR-ed expression in the last part, but a bit different since the real strings had some other baggage.
Re: RE greediness
by GrandFather (Saint) on May 29, 2024 at 21:13 UTC

    As others have suggested a positive look ahead match ((?=...) is likely to be a key to solving your unspecified problem. We may be able to help better if you let us know what the higher level problem is that you are trying to solve.

    Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
Re: RE greediness
by Anonymous Monk on May 29, 2024 at 19:11 UTC
    The (.*?) in your second attempt is a non-greedy match, which means it will match the smallest possible substring that satisfies the pattern. In this case, it matches an empty string because the optional (?:b=(\w))? can also match the empty string. Use a positive lookahead assertion to ensure that the (.*?) matches up to the optional b=(\w) part, but doesn't include it:
    /(a)(?:(.*?)(?:b=(\w)))?/
      (?:...) is a non-capturing group, positive lookahead uses (?=...) which I can't find in the regex used.

      map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
          (?:...) is a non-capturing group, positive lookahead uses (?=...) which I can't find in the regex used.
        Sorry for the typo, meant to type this, but it's still wrong:
        /(a)(?=(.*?)(?:b=(\w)))?/
        This seems to work:
        /(a)(.*?)(?=b=(\w)|\z)/