VingInMedina has asked for the wisdom of the Perl Monks concerning the following question:

Can anyone please explain to me what the first regular expression test will match, but the second one won't?

#!/usr/bin/perl use strict; use warnings; my $a = 'abc == 123'; if ($a =~ /^(.*?)==(.*)$/) { print "This will match\n"; print "First: $1\nLast: $2\n"; } if ($a =~ /^(.*?)\b==\b(.*)$/) { print "This will not match\n"; print "First: $1\nLast: $2\n"; }

I need to use the non-greedy match in the first part of the expression becuase I need to find the first occurrence of the '=='.

Replies are listed 'Best First'.
Re: Regular Expression Question
by davido (Cardinal) on May 02, 2012 at 18:36 UTC

    The second one requires a word boundary to exist immediately to the left and to the right of the ==. Your string, "abc == 123" has white-space immediately to the left and right of the ==. Consequently, the word boundaries are between 'c' and 'space', and again between 'space' and '1'. There is no boundary between 'space' and '=', nor between '=' and 'space'.

    Thus, the second RE cannot match against your target string. To put it a little more clearly (I hope):

    abc[boundary](space)==(space)[boundary]123 <---- What your string loo +ks like. abc(space)[boundary]==[boundary](space)123 <---- What you're trying t +o match.

    That second case is impossible; there can never be a word boundary between a space and an equals sign.

    Rather than using non-greedy quantifiers, why don't you instead specify what would be considered legal on the left-hand side of the ==. It's likely that your left-hand side needs to be an identifier of some sort. If that's the case, specify what characters are legal:

    m/^(\w+)\s*==\s*(\w+)$/

    If either side can contain quoted literals that may embed '==', you're going to have to use a real parser instead.


    Dave

      Thanks Dave

      I think I got it