jmclark has asked for the wisdom of the Perl Monks concerning the following question:

Monks, I've boiled my problem down to the simplest following code:
$gw="abcdefgh"; if ($gw =~ m/abc/ig){ print "abc\n"; } if ($gw =~ m/cde/ig){ print "cde\n"; } if ($gw =~ m/defgh/ig){ print "defgh\n"; } if ($gw =~ m/gh/ig){ print "gh\n"; } if ($gw =~ m/fg/ig){ print "fg\n"; }
Can someone explain to me why the above code basically only prints every other time. For instance, the above only spits out:
abc defgh fg
It seems to me as if the above should print:
abc cde defgh gh fg
Is there something I'm missing? I've got a script that checks the name of device and based on parts of its name I then know the devices function and need to run other things against that device. And it is completely possible that anywhere from 1 -5 of the if statements could match the device name and need additional code run in my script. The only work around I've come up with at this point is to basically say:
$gw1, $gw2, $gw3, $gw4, $gw5 = $gw; if ($gw =~.... if ($gw1 =~.... if ($gw2 =~.... etc...
It also doesn't matter if you change the order of the if statements, it will always match the 1st, skip the 2nd, match the 3rd, skip the 4th, etc. Thanks and I look forward again to the wisdom!!

Replies are listed 'Best First'.
Re: Multiple if statements matching part of one variable problem
by GrandFather (Saint) on Sep 10, 2008 at 04:33 UTC

    You are using the g modifier on the regex (why?). The first match positions pos($gw) after the 'c' so the subsequent match starts from there. For more info peruse Regexp Quote-Like Operators in perlop.


    Perl reduces RSI - it saves typing
Re: Multiple if statements matching part of one variable problem
by pat_mc (Pilgrim) on Sep 10, 2008 at 08:49 UTC
    Hi, jmclark -

    The question you ask is, I think, one that everyone working with regular expressions has at some stage. As GrandFather already explained, the problem is due to the behaviour of Perl's regular expression engine.
    The normal and idiomatic way to avoid this kind of undesired interference of the match position would be to write the checks for regular expression match as follows:
    $gw="abcdefgh"; if ( ( my $a = $gw ) =~ m/abc/ig){ print pos( $a ), ": abc\n"; } if ( ( my $b = $gw ) =~ m/cde/ig){ print pos( $b ), ": cde\n"; } if ( ( my $c = $gw ) =~ m/defgh/ig){ print pos( $c ), ": defgh\n"; } if ( ( my $d = $gw ) =~ m/gh/ig){ print pos( $d ), ": gh\n"; } if ( ( my $e = $gw ) =~ m/fg/ig){ print pos( $e ), ": fg\n"; }
    Effectively, you are thus copying the contents of $gw into the variables $a, $b ... and are not directly checking for the match of $gw anymore. This approach will then return what you had initially expected:
    3: abc 5: cde 8: defgh 8: gh 7: fg
    Hope this helps.

    Regards -

    Pat
      Ah, ok, so thats a "feature" of perls regex engine. So basically that code is doing to same thing as me just duplicating the $gw var and then matching against those newly created variables. Maybe its my missunderstanding of "g" but I was thinking that meant globally match anywhere in the string. So if I was wanting to match "def" in the string of "abcdefg" then I would use the "g" otherwise I'd have to match on something like /.*def.*/ Is that incorrect?

        Yes, that's completely incorrect.

        First, "g" doesn't do that at all. It means (more or less) "all instances".

        # Check for a match if (/pat/) { ... } # Find all matches while (/pat/g) { ... }

        Second, that's not how regexps match at all.

        print( 'abc' =~ /b/ ?1:0,"\n"); # 1 print( 'abc' =~ /^b\z/ ?1:0,"\n"); # 0 print( 'abc' =~ /^abc\z/ ?1:0,"\n"); # 1 print( 'a2b3c' =~ /(\d)/ ?$1:0,"\n"); # 2 print( 'a2b3c' =~ /^.*(\d)/ ?$1:0,"\n"); # 3 print( 'a2b3c' =~ /^.*?(\d)/ ?$1:0,"\n"); # 2

        References:

        Yes, that is misunderstanding the role of the g modifier. Simplistically /g means "match as many times as you can". In a list context that means the regex will return all the matches it finds. Consider:

        my @matches = '1 foo 22 bar 3' =~ /\d+/g; print "@matches";

        Prints:

        1 22 3

        In scalar context however it returns true while there is a "next" match. To see what was matched we now have to capture the bit we are interested in:

        while (my $match = '1 foo 22 bar 3' =~ /(\d+)/g) { print "$1 "; }

        which generates the same output as above. Your code is rather like this last version except that you have "unwound" the loop.

        To get the behaviour you expected without the /g you need to "anchor" the match at the start of the string using ^:

        my @matches = '1 foo 22 bar 3' =~ /^\d+/g; print "@matches";

        which prints '1'. For further regex reading see perlretut, perlre and perlreref.


        Perl reduces RSI - it saves typing