regex problem: doesn't work on the first search but works on the second

onlyauto has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a question/problem regarding regex. So I have a hash of hashes containing strings. The strings each contain some characters, some digits and at the end it contains either one of the suffixes "AAA", "BBB" or "CCC". By using loops and regex I'm trying to access each piece of information and print messages depending on the suffix of each string. The part of the code I'm having problems with looks something like this:

my %hash; # hash of hashes already defined and filled in an earlier pa
+rt of code

foreach (sort keys %{$hash{$key1}}){
    if ($hash{$key1}{$_}=~ /^.+_(\d+)_.+AAA/gi){
    print "found1\n";
    }
    if ($hash{$key1}{$_}=~ /^.+_(\d+)_.+BBB/gi){
    print "found2\n";
    }
    if ($hash{$key1}{$_}=~ /^.+_(\d+)_.+CCC/gi){
    print "found3\n";
    }
}
[download]

I found that the regex has no problem with the data entries containing the suffixes "BBB" and "CCC" but all the suffixes with "AAA" are ignored. So the output looks like this:

 found2
 found3
 found2
 found3
 ...
[download]

I found that if I just copied the 'if' block for the "AAA" suffixes and pasted it directly after the first section, the first regex still doesn't match, but the second one does:

foreach (sort keys %{$hash{$key1}}){
    if ($hash{$key1}{$_}=~ /^.+_(\d+)_.+AAA/gi){
    print "found1\n";
    }
# copied from above and changed the output to 'foundx'
    if ($hash{$key1}{$_}=~ /^.+_(\d+)_.+AAA/gi){
    print "foundx\n";
    }
#
    if ($hash{$key1}{$_}=~ /^.+_(\d+)_.+BBB/gi){
    print "found2\n";
    }
    if ($hash{$key1}{$_}=~ /^.+_(\d+)_.+CCC/gi){
    print "found3\n";
    }
}
[download]

The output is as follows:

 foundx
 found2
 found3
 foundx
 found2
 found3
 ...
[download]

I have no clue what's going on here... Any help would be appreciated. Thanks!

Comment on regex problem: doesn't work on the first search but works on the second Select or Download Code

Replies are listed 'Best First'.
Re: regex problem: doesn't work on the first search but works on the second by rminner (Chaplain) on Nov 28, 2013 at 09:29 UTC
Does it work if you drop the /g at the end of your regex? Some food for thought: `use strict; use warnings; my $string = 'foo bar'; if ($string =~ m{bar}g) { print "bar\n"; } if ($string =~ m{foo}g) { print "foo\n"; } if ($string =~ m{bar}g) { print "bar\n"; }` [download] output: `bar bar` [download] without the /g it will print: `bar foo bar` [download] The /g will keep the current position in the string and as such you are later continuing your matching from that position. If you just want to match your entire string (as it seems in this case), simply try dropping the /g. As tobyink said, it would be easier if you provided some input data. Update: A stackoverflow link which explains it: http://stackoverflow.com/questions/6969208/help-understanding-global-flag-in-perl	[reply] [d/l] [select]
Re^2: regex problem: doesn't work on the first search but works on the second by Anonymous Monk on Nov 28, 2013 at 10:06 UTC
Thanks for the quick replies. Unfortunately, I cannot provide the exact data due to confidential agreements with my customers. But dropping the 'g' as rminner suggested did the trick. However I don't really understand the logic behind this. Could you perhaps explain why? (or link a thread, where this is explained?). Thanks a lot for the help!	[reply]
Re^3: regex problem: doesn't work on the first search but works on the second by AnomalousMonk (Archbishop) on Nov 28, 2013 at 19:52 UTC
... I cannot provide the exact data due to confidential[ity] agreements ... No need to provide proprietary data or code. A small, standalone, working example including dummy data such as provided by tobyink would have done the trick. Indeed, the process of writing and verifying such code would probably have given you valuable insight into what was going on: tobyink's code seems to do just what you want even with needless `/g` modifiers, and this should have rung a bell or two for you.	[reply] [d/l]
Re^3: regex problem: doesn't work on the first search but works on the second by AnomalousMonk (Archbishop) on Nov 28, 2013 at 19:34 UTC
Could you perhaps explain why? In conjunction with the SO explanations linked by rminner, consider this variation on rminner's code. The `@-` special match array variable (see perlvar) is used to show the starting position of the match that is found: `$-[0]` Then pos is printed to show the string position from which further matching will continue. The second match never finds `'foo'` because when it trys to do so, it's already past it! Now remove all the `/g` regex modifiers from all the matches and see what happens. See perlre, perlretut. >perl -wMstrict -le "my $string = 'foo bart bare'; ;; if ($string =~ m{bar}g) { print qq{found bar at offset $-[0]}; } print 'string pos is ', defined(pos $string) ? pos $string : 'UNDEF'; ;; if ($string =~ m{foo}g) { print qq{found foo at offset $-[0]}; } print 'string pos is ', defined(pos $string) ? pos $string : 'UNDEF'; ;; if ($string =~ m{bar}g) { print qq{found bar at offset $-[0]}; } print 'string pos is ', defined(pos $string) ? pos $string : 'UNDEF'; " found bar at offset 4 string pos is 7 string pos is UNDEF found bar at offset 4 string pos is 7 [download] In the code as shown, the second `'bar'` match at offset 9 is never found because the intervening `m{foo}g` match fails and 'resets' the string match position. With all `/g` modifiers in place, try matching with `m{foo}gc` (note the added `/c` modifier) and see what happens.	[reply] [d/l] [select]
Re: regex problem: doesn't work on the first search but works on the second by tobyink (Canon) on Nov 28, 2013 at 09:02 UTC
You're not showing us any of your input data, but the following certainly works for me: `my $key1 = 'xxx'; my %hash = ( $key1 => { a => 'foo_1_AAAA', b => 'foo_1_BBBB', c => 'foo_1_CCCC', }, ); foreach (sort keys %{$hash{$key1}}) { if ($hash{$key1}{$_}=~ /^.+_(\d+)_.+AAA/gi) { print "found1\n"; } if ($hash{$key1}{$_}=~ /^.+_(\d+)_.+BBB/gi) { print "found2\n"; } if ($hash{$key1}{$_}=~ /^.+_(\d+)_.+CCC/gi) { print "found3\n"; } }` [download] `use Moops; class Cow :rw { has name => (default => 'Ermintrude') }; say Cow->new->name`	[reply] [d/l]
Re^2: regex problem: doesn't work on the first search but works on the second by Anonymous Monk on Nov 29, 2013 at 11:48 UTC
Thanks guys for all the posts. It was very helpful. I am reworking the code now to be more efficient. Thanks again!	[reply]
Re: regex problem: doesn't work on the first search but works on the second by kcott (Archbishop) on Nov 29, 2013 at 06:23 UTC
G'day onlyauto, Welcome to the monastery. I see you have a solution to your posted problem. There's another issue I thought I'd point out. All those `if` statements: duplicate a lot of code incur a lot of unnecessary processing are error-prone from a maintenance perspective do not lend themselves to extensibility Consider writing your code more along these lines: `#!/usr/bin/env perl -l use strict; use warnings; my @data = ('X_12_yzAAA', 'X_34_yzBBB', 'X_56_yzCCC', 'X_78_yzAAA',); my $re = qr{^.+_(\d+)_.+([A-C]{3})$}; my %despatch = ( AAA => sub { print 'found1 with digits: ', shift }, BBB => sub { print 'found2 with digits: ', shift }, CCC => sub { print 'found3 with digits: ', shift }, ); /$re/ && $despatch{$2}->($1) for @data;` [download] Output: `found1 with digits: 12 found2 with digits: 34 found3 with digits: 56 found1 with digits: 78` [download] Now you have: a single regex, only one thing to possibly change here no unnecessary processing: all done in a single statement less chance of maintenance errors because there's a lot less code an easily extensible solution: just add '`DDD => sub { ... }`' (or whatever) to `%despatch` [Update: and change `A-C` to `A-D` in one place.] -- Ken	[reply] [d/l] [select]
Re^2: regex problem: doesn't work on the first search but works on the second by hdb (Monsignor) on Nov 29, 2013 at 07:07 UTC
You could build your regex from the keys of the `%despatch` hash. Then you only have to change the code in one place.	[reply] [d/l]