Hosen1989 has asked for the wisdom of the Perl Monks concerning the following question:

Dear ALL,

I was doing some parsing for log file and come to this bug (i think), I add the next simple code to show you what I face:

use strict; use warnings; my $data = 'blabla;tag1=12345;blabla;'; # my $data = 'blabla;tag1=12345;blabla;tag2=99999'; # get tag1 value $data =~ m/tag1=(\d+)/g; my $tag1 = $1; # get tag2 value $data =~ m/tag2=(\d+)/g; my $tag2 = $1; print "tag1 = $tag1\n"; print "tag2 = $tag2\n";

The output:

tag1 = 12345 tag2 = 12345

Us you can see the are only tag1 value in $data, so should be no match and the second tag $tag2 should be undefined,

but what i got is  $tag1 =$tag2!!!.

So can any monk here (and pretty please) explain to my what happen here.

BR

Hosen

Replies are listed 'Best First'.
Re: ambiguous regex match
by Corion (Patriarch) on Jul 24, 2015 at 16:46 UTC

    A failed match doesn't reset $1 and $2. You need to guard your assignments with if statements:

    my( $tag1, $tag2 ); # get tag1 value if( $data =~ m/tag1=(\d+)/) { $tag1 = $1; }; # get tag2 value if( $data =~ m/tag2=(\d+)/ ) { $tag2 = $1; };

    Also note that the /g modifier doesn't make sense.

Re: ambiguous regex match
by stevieb (Canon) on Jul 24, 2015 at 16:52 UTC

    First, you're using the wrong $data variable. Second, as Corion points out, you need to protect with if.

    my $data = 'blabla;tag1=12345;blabla;tag2=99999'; my $tag1; my $tag2; $data =~ /tag1=(\d+)/; $tag1 = $1 if $1; $data =~ /tag2=(\d+)/; $tag2 = $1 if $1; # # or # # the '?' in the below regex means it'll grab the closest # tag2, and ignore any beyond it ($tag1, $tag2) = $data =~ /tag1=(\d+).*?tag2=(\d+)/ if $1 && $2; print "$tag1, $tag2";

    Update: I could have sworn you were using $2 in the second match. If that was the case and you edited your post, I want to point out that when doing independent matches, the second run to get a match will re-use $1. To use $2, you need both capture groups within a single regex.

      $data =~ /tag2=(\d+)/; $tag2 = $1 if $1;
      I am afraid this will not work correctly if the regex fails, because $1 will still be set to its previous value (previous regex match). So that rather than testing $1, we should test whether the regex matched.

      I think that you would need rather something like this:

      $tag1 = $1 if $data =~ /tag1=(\d+)/; $tag2 = $1 if $data =~ /tag2=(\d+)/;
Re: ambiguous regex match
by Marshall (Canon) on Jul 26, 2015 at 11:50 UTC
    In general, with Perl, do not mess with $1 or $2, etc variables. There are exceptions, this is not one of them.

    #!/usr/bin/perl use warnings; use strict; my @data; while (<DATA>) { next if ( /^\s*$/ ); #skip blank lines # easy way to to get numbers after "=" my ($tag1, $tag2, $extra) = /\s*=\s*(\d+)/g; #skip malformed lines.. #must be 2 number tokens after "=" if (defined $tag1 and defined $tag2 and not defined $extra) { print "$tag1 \t$tag2\n"; } else { print "Bad Line:$_"; } } =prints 12345 99999 Bad Line:blabla;tag1=12345;blabla Bad Line:awdrfaf; tag2 = 122345; 6768 9876 Bad Line:079a8; tag1 =345; tag2 = 895; tag3=3987; 12567 98999 7899 8977 =cut __DATA__ blabla;tag1=12345;blabla;tag2=99999 blabla;tag1=12345;blabla awdrfaf; tag2 = 122345; asdf;tag 1 = 6768; tag2= 9876; 079a8; tag1 =345; tag2 = 895; tag3=3987; blabla;tag1 =12567;blabla;tag2= 98999 blabla;tag1= 7899;blabla; tag2 =8977