knudsj01 has asked for the wisdom of the Perl Monks concerning the following question:

I'm having a strange issue with some pattern matching I'm doing in a multiline file. I'm using a regular expression with the gm operators to search for two different patterns. When I run my perl script it finds the first pattern but fails on the 2nd even though the pattern is clearly in the file. Here is a test script I wrote that shows the problem.
#!/usr/local/bin/perl use strict; use warnings; my $contents =<<EOT; #################### # GEOGRAPHY CONFIG # #################### environment.type = eu deployment.type = au EOT #--------------------------------------------------------------------- +--------- # Check that the properties deployment.type and environment.type exist #--------------------------------------------------------------------- +--------- if ( $contents !~ /deployment.type/gm ) { print "ERROR: deployment.type not found in contents\n"; } else { print "deployment.type matched\n"; } if ( $contents !~ /environment.type/gm ) { print "ERROR: environment.type not found in contents\n"; } else { print "environment.type matched\n"; }
When I run this perl script I get the following:
C:\test>perl test.pl deployment.type matched ERROR: environment.type not found in contents
One thing I have noticed is if I flip the two pattern searches around (search for environment.type first and deployment.type 2nd, like they exist in $contents) it finds both patterns. But if the pattern searches are reveresed in order from the way they appear in $contents the 2nd pattern match fails.

It's like once it matches the 1st string it gets stuck at the end of $contents and doesn't being searching from the beginning of $contents when looking for the 2nd string search.

I am tyring to understand why this happens? Why should the order of these pattern searches matter? I thought that the gm operators would search all of $contents from begining to end and be reset before beginning the 2nd search but it doesn't seem like that is happening. Does anyone know why this occurs?

Replies are listed 'Best First'.
Re: Problem with 2nd string match in file using regex with gm operators
by ikegami (Patriarch) on Jun 03, 2009 at 20:42 UTC

    I thought that the gm operators would search all of $contents from begining to end and be reset

    No. In scalar context, m//g finds the next match and notes where it found the match so it can use that as the starting point for the next match. It would be a waste of CPU to look for more than one match, and it notes where it found the match to allow constructs such as

    while (/.../g) { ... }

    Drop the "g".

    Drop the "m" while you're at it. "m" affects the behaviour of ^ and $, neither of which you are using.

    Update: Oh, and /./ should be /\./ since /./ means "match any character", not "match a period".

Re: Problem with 2nd string match in file using regex with gm operators
by shmem (Chancellor) on Jun 03, 2009 at 23:18 UTC
    Does anyone know why this occurs?

    (in addition to ikegami's response above) - that's also because your $contents

    my $contents =<<EOT; #################### # GEOGRAPHY CONFIG # #################### environment.type = eu deployment.type = au EOT

    contains " = au\n\n" after deployment.type. The regular expression engine sets its pointer just after the "type" in "deployment.type = au" - but the string doesn't finish at that point. That's why the position isn't reset.

    But even if you'd correct that condition, the position at the end of the string would not be reset, since it is only reset if /g is exhausted, passing the end of the string. Consider:

    use strict; use warnings; my $contents = " #################### # GEOGRAPHY CONFIG # #################### environment.type = eu deployment.type = au"; if ( $contents !~ /deployment.type = (\w+)/gm) { print "ERROR: deployment.type not found in contents\n"; } else { print "deployment.type matched\n"; } $contents =~ /./g; # <-- this search passes end of string, and resets if ( $contents !~ /environment.type = (\w+)/gm ) { print "ERROR: environment.type not found in contents\n"; } else { print "environment.type matched\n"; } __END__ qwurx [shmem] 01:36 ~ > perl au.pl deployment.type matched environment.type matched