Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

In this contrived little example, I'm matching any "a" preceded by a "c" and any "t" preceded by a "k":
$_ = "caaat"; my @hits = m/(?:c(a)|k(t))/g; foreach (@hits) { print "|$_|\n" }
It produces two hits - an "a" and an empty string. Yet if both conditions fail (by changing $_ to "zzzzz"), it produces no hits, as expected. Why the empty string? And can I avoid it, other than by post-processing @hits? Thanks!

Replies are listed 'Best First'.
Re: empty hits with regex
by GrandFather (Saint) on Oct 02, 2008 at 18:56 UTC

    You have two captures in the regex so for each match you get two entries added to @hits. However one of the captures will be undef because only one of the captures will match. Instead you could:

    print '|', join ('|, ', "caaat" =~ m/( (?<=c)a | (?<=k)t )/gx), "|\n";

    Prints:

    |a|

    Note that if you are using strictures (use strict; use warnings;) you will have received a 'Use of uninitialized value ...' warning.


    Perl reduces RSI - it saves typing
Re: empty hits with regex
by ikegami (Patriarch) on Oct 02, 2008 at 19:21 UTC
    You have two capturing parens in the pattern, so the match operator returns two captures when the match operator is successful. Always.

    Here's a couple of other solutions:

    my @hits = grep defined, m/(?:c(a)|k(t))/g;
    my @hits; while ( m/(?:c(a)|k(t))/g ) { push @hits, defined $1 ? $1 : $2; }

    The last one is extensible too:

    my @hits; while ( m/(?:c(a)|k(t)|k(k))/g ) { push @hits, defined $1 ? $1 : defined $2 ? $2 : $3 }
      In 5.10, there's a much easier solution: (?|).
      $_ = "caaat"; my @hits = /(?|c(a)|k(t))/g; say scalar @hits; say @hits; __END__ 1 a

        You just made me want to get our infrastructure team to upgrade to perl 5.10. Thanks. Now I'm not going to get any real work done next week.