in reply to Multiple if statements matching part of one variable problem

Hi, jmclark -

The question you ask is, I think, one that everyone working with regular expressions has at some stage. As GrandFather already explained, the problem is due to the behaviour of Perl's regular expression engine.
The normal and idiomatic way to avoid this kind of undesired interference of the match position would be to write the checks for regular expression match as follows:
$gw="abcdefgh"; if ( ( my $a = $gw ) =~ m/abc/ig){ print pos( $a ), ": abc\n"; } if ( ( my $b = $gw ) =~ m/cde/ig){ print pos( $b ), ": cde\n"; } if ( ( my $c = $gw ) =~ m/defgh/ig){ print pos( $c ), ": defgh\n"; } if ( ( my $d = $gw ) =~ m/gh/ig){ print pos( $d ), ": gh\n"; } if ( ( my $e = $gw ) =~ m/fg/ig){ print pos( $e ), ": fg\n"; }
Effectively, you are thus copying the contents of $gw into the variables $a, $b ... and are not directly checking for the match of $gw anymore. This approach will then return what you had initially expected:
3: abc 5: cde 8: defgh 8: gh 7: fg
Hope this helps.

Regards -

Pat

Replies are listed 'Best First'.
Re^2: Multiple if statements matching part of one variable problem
by jmclark (Novice) on Sep 10, 2008 at 14:14 UTC
    Ah, ok, so thats a "feature" of perls regex engine. So basically that code is doing to same thing as me just duplicating the $gw var and then matching against those newly created variables. Maybe its my missunderstanding of "g" but I was thinking that meant globally match anywhere in the string. So if I was wanting to match "def" in the string of "abcdefg" then I would use the "g" otherwise I'd have to match on something like /.*def.*/ Is that incorrect?

      Yes, that's completely incorrect.

      First, "g" doesn't do that at all. It means (more or less) "all instances".

      # Check for a match if (/pat/) { ... } # Find all matches while (/pat/g) { ... }

      Second, that's not how regexps match at all.

      print( 'abc' =~ /b/ ?1:0,"\n"); # 1 print( 'abc' =~ /^b\z/ ?1:0,"\n"); # 0 print( 'abc' =~ /^abc\z/ ?1:0,"\n"); # 1 print( 'a2b3c' =~ /(\d)/ ?$1:0,"\n"); # 2 print( 'a2b3c' =~ /^.*(\d)/ ?$1:0,"\n"); # 3 print( 'a2b3c' =~ /^.*?(\d)/ ?$1:0,"\n"); # 2

      References:

      Yes, that is misunderstanding the role of the g modifier. Simplistically /g means "match as many times as you can". In a list context that means the regex will return all the matches it finds. Consider:

      my @matches = '1 foo 22 bar 3' =~ /\d+/g; print "@matches";

      Prints:

      1 22 3

      In scalar context however it returns true while there is a "next" match. To see what was matched we now have to capture the bit we are interested in:

      while (my $match = '1 foo 22 bar 3' =~ /(\d+)/g) { print "$1 "; }

      which generates the same output as above. Your code is rather like this last version except that you have "unwound" the loop.

      To get the behaviour you expected without the /g you need to "anchor" the match at the start of the string using ^:

      my @matches = '1 foo 22 bar 3' =~ /^\d+/g; print "@matches";

      which prints '1'. For further regex reading see perlretut, perlre and perlreref.


      Perl reduces RSI - it saves typing