onlyauto has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a question/problem regarding regex. So I have a hash of hashes containing strings. The strings each contain some characters, some digits and at the end it contains either one of the suffixes "AAA", "BBB" or "CCC". By using loops and regex I'm trying to access each piece of information and print messages depending on the suffix of each string. The part of the code I'm having problems with looks something like this:

my %hash; # hash of hashes already defined and filled in an earlier pa +rt of code foreach (sort keys %{$hash{$key1}}){ if ($hash{$key1}{$_}=~ /^.+_(\d+)_.+AAA/gi){ print "found1\n"; } if ($hash{$key1}{$_}=~ /^.+_(\d+)_.+BBB/gi){ print "found2\n"; } if ($hash{$key1}{$_}=~ /^.+_(\d+)_.+CCC/gi){ print "found3\n"; } }

I found that the regex has no problem with the data entries containing the suffixes "BBB" and "CCC" but all the suffixes with "AAA" are ignored. So the output looks like this:

found2 found3 found2 found3 ...

I found that if I just copied the 'if' block for the "AAA" suffixes and pasted it directly after the first section, the first regex still doesn't match, but the second one does:

foreach (sort keys %{$hash{$key1}}){ if ($hash{$key1}{$_}=~ /^.+_(\d+)_.+AAA/gi){ print "found1\n"; } # copied from above and changed the output to 'foundx' if ($hash{$key1}{$_}=~ /^.+_(\d+)_.+AAA/gi){ print "foundx\n"; } # if ($hash{$key1}{$_}=~ /^.+_(\d+)_.+BBB/gi){ print "found2\n"; } if ($hash{$key1}{$_}=~ /^.+_(\d+)_.+CCC/gi){ print "found3\n"; } }

The output is as follows:

foundx found2 found3 foundx found2 found3 ...

I have no clue what's going on here... Any help would be appreciated. Thanks!

Replies are listed 'Best First'.
Re: regex problem: doesn't work on the first search but works on the second
by rminner (Chaplain) on Nov 28, 2013 at 09:29 UTC
    Does it work if you drop the /g at the end of your regex?
    Some food for thought:
    use strict; use warnings; my $string = 'foo bar'; if ($string =~ m{bar}g) { print "bar\n"; } if ($string =~ m{foo}g) { print "foo\n"; } if ($string =~ m{bar}g) { print "bar\n"; }
    output:
    bar bar
    without the /g it will print:
    bar foo bar
    The /g will keep the current position in the string and as such you are later continuing your matching from that position. If you just want to match your entire string (as it seems in this case), simply try dropping the /g.
    As tobyink said, it would be easier if you provided some input data.

    Update:
    A stackoverflow link which explains it: http://stackoverflow.com/questions/6969208/help-understanding-global-flag-in-perl

      Thanks for the quick replies. Unfortunately, I cannot provide the exact data due to confidential agreements with my customers. But dropping the 'g' as rminner suggested did the trick. However I don't really understand the logic behind this. Could you perhaps explain why? (or link a thread, where this is explained?). Thanks a lot for the help!

        ... I cannot provide the exact data due to confidential[ity] agreements ...

        No need to provide proprietary data or code. A small, standalone, working example including dummy data such as provided by tobyink would have done the trick. Indeed, the process of writing and verifying such code would probably have given you valuable insight into what was going on: tobyink's code seems to do just what you want even with needless  /g modifiers, and this should have rung a bell or two for you.

        Could you perhaps explain why?

        In conjunction with the SO explanations linked by rminner, consider this variation on rminner's code. The  @- special match array variable (see perlvar) is used to show the starting position of the match that is found:  $-[0]
        Then pos is printed to show the string position from which further matching will continue. The second match never finds  'foo' because when it trys to do so, it's already past it! Now remove all the  /g regex modifiers from all the matches and see what happens. See perlre, perlretut.

        >perl -wMstrict -le "my $string = 'foo bart bare'; ;; if ($string =~ m{bar}g) { print qq{found bar at offset $-[0]}; } print 'string pos is ', defined(pos $string) ? pos $string : 'UNDEF'; ;; if ($string =~ m{foo}g) { print qq{found foo at offset $-[0]}; } print 'string pos is ', defined(pos $string) ? pos $string : 'UNDEF'; ;; if ($string =~ m{bar}g) { print qq{found bar at offset $-[0]}; } print 'string pos is ', defined(pos $string) ? pos $string : 'UNDEF'; " found bar at offset 4 string pos is 7 string pos is UNDEF found bar at offset 4 string pos is 7

        In the code as shown, the second  'bar' match at offset 9 is never found because the intervening  m{foo}g match fails and 'resets' the string match position. With all  /g modifiers in place, try matching with  m{foo}gc (note the added  /c modifier) and see what happens.

Re: regex problem: doesn't work on the first search but works on the second
by tobyink (Canon) on Nov 28, 2013 at 09:02 UTC

    You're not showing us any of your input data, but the following certainly works for me:

    my $key1 = 'xxx'; my %hash = ( $key1 => { a => 'foo_1_AAAA', b => 'foo_1_BBBB', c => 'foo_1_CCCC', }, ); foreach (sort keys %{$hash{$key1}}) { if ($hash{$key1}{$_}=~ /^.+_(\d+)_.+AAA/gi) { print "found1\n"; } if ($hash{$key1}{$_}=~ /^.+_(\d+)_.+BBB/gi) { print "found2\n"; } if ($hash{$key1}{$_}=~ /^.+_(\d+)_.+CCC/gi) { print "found3\n"; } }
    use Moops; class Cow :rw { has name => (default => 'Ermintrude') }; say Cow->new->name

      Thanks guys for all the posts. It was very helpful. I am reworking the code now to be more efficient. Thanks again!

Re: regex problem: doesn't work on the first search but works on the second
by kcott (Archbishop) on Nov 29, 2013 at 06:23 UTC

    G'day onlyauto,

    Welcome to the monastery.

    I see you have a solution to your posted problem. There's another issue I thought I'd point out.

    All those if statements:

    • duplicate a lot of code
    • incur a lot of unnecessary processing
    • are error-prone from a maintenance perspective
    • do not lend themselves to extensibility

    Consider writing your code more along these lines:

    #!/usr/bin/env perl -l use strict; use warnings; my @data = ('X_12_yzAAA', 'X_34_yzBBB', 'X_56_yzCCC', 'X_78_yzAAA',); my $re = qr{^.+_(\d+)_.+([A-C]{3})$}; my %despatch = ( AAA => sub { print 'found1 with digits: ', shift }, BBB => sub { print 'found2 with digits: ', shift }, CCC => sub { print 'found3 with digits: ', shift }, ); /$re/ && $despatch{$2}->($1) for @data;

    Output:

    found1 with digits: 12 found2 with digits: 34 found3 with digits: 56 found1 with digits: 78

    Now you have:

    • a single regex, only one thing to possibly change here
    • no unnecessary processing: all done in a single statement
    • less chance of maintenance errors because there's a lot less code
    • an easily extensible solution: just add 'DDD => sub { ... }' (or whatever) to %despatch [Update: and change A-C to A-D in one place.]

    -- Ken

      You could build your regex from the keys of the %despatch hash. Then you only have to change the code in one place.