abhy has asked for the wisdom of the Perl Monks concerning the following question:

my $foo = "bar road"; if ($foo =~ /\b\w+\s*(road)?/) { print "Matched:", $1, "\n"; }

In the above code in $1 road is captured which is as expected.

Now if I change the regex to /\b\w+\s*?(road)?/, that is, make the space matching lazy then "road" is not captured.

Can someone please tell me as to why this happens as soon as I make the space matching lazy?

Replies are listed 'Best First'.
Re: Regex lazy behaviour
by GrandFather (Saint) on Mar 03, 2009 at 10:35 UTC

    You have made the road match optional. With a greedy \s* match road can be matched. If you go the minimal match route with \s*? then no white space gets matched and, because the road match is optional, road doesn't get matched either.

    Actually a greedy \s* is always ok if the next thing to be matched is \S because all the white space has to match in any case.


    True laziness is hard work
Re: Regex lazy behaviour
by moritz (Cardinal) on Mar 03, 2009 at 10:29 UTC
    \s*? matches zero spaces, then the regex engine tries to match (road)?. It succeeds with the "zero" option that the "zero-or-more" questionmark offers. No need to backtrack.

    If you want road to get captured anyway, you can make \s* eager and (road)? still optional (and eager).

Re: Regex lazy behaviour
by ELISHEVA (Prior) on Mar 03, 2009 at 10:40 UTC
    To clarify a bit:
    \b matches before bar \w+ matches "bar" \s* matches as many spaces as it can swallow (it's greedy), i.e. " " (road)? matches "road" but \b matches before bar \w+ matches "bar" \s*? matches the *empty* string (it isn't greedy) so (road)? attempts to match " road" - which of course fails
Re: Regex lazy behaviour
by abhy (Novice) on Mar 04, 2009 at 06:18 UTC
    Thanks monks.