steve077 has asked for the wisdom of the Perl Monks concerning the following question:

### Code Snippet @array = ('<Band:3>40M <Call:5>KD4RR <QSL_Rcvd:1>Y <QSL_Sent:1>Y', '<Band:3>40M <Call:5>K7RRR <QSL_Sent:1>Y', '<Band:3>40M <Call:5>W7FAL <QSL_Rcvd:1>Y <QSL_Sent:1>Y'); my $num_elements = @array; my $counter = 0; while ($counter < $num_elements) { $array[$counter] =~ /(QSL_RCVD:|QSL_Rcvd:)(\d+)>(\w+)/; if ($3 eq "Y") { print "Good Record: $array[$counter]\n"; $counter++ } else { print "Bad Record: $array[$counter]\n"; $counter++ } }

Problem Description: When I run the above code,
I would expect the 2nd element in the array to
be printed as a bad record, as there is not a match
for the pattern being searched for (it doesn't have a
QSL_Rcvd field). However, it appears the pattern is
matched and it is printed as a good record. If I
re-arrange the order of the elements in the array,
and put the second element as the first element, then
the code works as I expected and the first element
is printed as a bad record.

From observation it looks like there is some kind of
persistence in $3, but that seems unlikely, so I am
thinking the problem is somehow related to my search
expression. Any help or insight would be appreciated.

Replies are listed 'Best First'.
Re: Problem with regexp or saved match variable.
by ig (Vicar) on Aug 27, 2009 at 05:28 UTC

    Your suspicion about the persistence of $3 is correct. If the pattern doesn't match $3 is not modified. You might consider something like the following:

    use strict; use warnings; my @array = ('<Band:3>40M <Call:5>KD4RR <QSL_Rcvd:1>Y <QSL_Sent:1>Y', '<Band:3>40M <Call:5>K7RRR <QSL_Sent:1>Y', '<Band:3>40M <Call:5>W7FAL <QSL_Rcvd:1>Y <QSL_Sent:1>Y'); foreach (@array) { if(/<QSL_Rcvd:\d+>Y/i) { print "Good Record: $_\n"; } else { print "Bad Record: $_\n"; } }

      Thanks for the explanation on persistence of the match variables. I was seeing it, but didn't believe it! I searched my Perl books, but didn't see that tidbit of information! Guess I need to upgrade my Perl library.

        There are references to many excellent sources in Getting Started with Perl but with so much information it can be difficult to find the tidbit you need sometimes.

        Another excellent way to learn is by experiment and test. With practice your testing will become more thorough and your confidence in your own observations will grow accordingly.

        In this case, I don't know that a better library would suffice. I think you have found a bug in perl. Capture buffers says:

        The numbered match variables ($1, $2, $3, etc.) and the related punctuation set ($+, $&, "$`", "$'", and $^N) are all dynamically scoped until the end of the enclosing block or until the next successful match, whichever comes first. (See "Compound Statements" in perlsyn.)

        According to this, I would expect the numbered match variables to go out of scope at the end of each iteration of the loop block and each new iteration to begin without them set, unless they were set in an outer context. But this is not what happens.

        The behavior is quite different if a continue BLOCK is added to the loop. Consider the following:

        use strict; #use warnings; my @array = ('<Band:3>40M <Call:5>KD4RR <QSL_Rcvd:1>Y <QSL_Sent:1>Y', '<Band:3>40M <Call:5>K7RRR <QSL_Sent:1>Y', '<Band:3>40M <Call:5>W7FAL <QSL_Rcvd:1>Y <QSL_Sent:1>Y'); foreach (@array) { if(/<QSL_Rcvd:\d+>(Y)/i) { print "Good Record: $_\n"; } else { print "Bad Record: $_\n"; } print "in loop: \$1 = $1\n"; } continue { my $var = 1; print "in continue: \$1 = $1\n"; } print "after loop: \$1 = $1\n";

        Which produces

        Good Record: <Band:3>40M <Call:5>KD4RR <QSL_Rcvd:1>Y <QSL_Sent:1>Y in loop: $1 = Y in continue: $1 = Bad Record: <Band:3>40M <Call:5>K7RRR <QSL_Sent:1>Y in loop: $1 = in continue: $1 = Good Record: <Band:3>40M <Call:5>W7FAL <QSL_Rcvd:1>Y <QSL_Sent:1>Y in loop: $1 = Y in continue: $1 = after loop: $1 =

        With the continue block present, $1 is not set in the second iteration of the loop. One might argue that without the continue block the foreach statement doesn't leave its block until after the last iteration, but I am not aware of any documentation that says so and I find this behavior to be quite surprising. It seems obvious that the block is entered, executed and exited once with each iteration, whether there is a continue block present or not. Yet this is not the behavior, at least in perl 5.10.0

        bug report

Re: Problem with regexp or saved match variable.
by jwkrahn (Abbot) on Aug 27, 2009 at 05:28 UTC

    The variables $1, $2 and $3 keep their old values when the match fails so when the second element doesn't match the values from the first match are used.

    You need to verify that the match succeeded before using the numerical variables.   Something like:

    my @array = ( '<Band:3>40M <Call:5>KD4RR <QSL_Rcvd:1>Y <QSL_Sent:1>Y', '<Band:3>40M <Call:5>K7RRR <QSL_Sent:1>Y', '<Band:3>40M <Call:5>W7FAL <QSL_Rcvd:1>Y <QSL_Sent:1>Y', ); for ( @array ) { if ( /QSL_R(?:CVD|cvd):\d+>(\w+)/ && $1 eq 'Y' ) { print "Good Record: $_\n"; } else { print "Bad Record: $_\n"; } }
Re: Problem with regexp or saved match variable.
by ssandv (Hermit) on Aug 27, 2009 at 05:43 UTC
    The reason this, and many other variables are persistent, is that they're "global special variables", which is to say, Main::variable_name has some magic to it (in this case, Main::$3 is the third captured match). The upshot is, once $3 gets set by a match, it can't go away by going out of scope (Ignore that, it was wrong. Match variables are implicitly dynamically scoped for extra magic, as ig points out in the reply below.) perlvar could provide some further enlightenment on the subject (or might bury you in details). As the other answers have said, you need to test that the match succeeded before you use the result.

      I'm not sure, but it seems otherwise to me. Consider the following example:

      my $_ = '1'; /(.)/; print "before: \$1 = $1\n"; for (2..4) { print "\$_ = $_\n"; print "first: \$1 = $1\n"; /(3)/; print "second: \$1 = $1\n"; } continue {} print "after: \$1 = $1\n";

      Which produces

      before: $1 = 1 $_ = 2 first: $1 = 1 second: $1 = 1 $_ = 3 first: $1 = 1 second: $1 = 3 $_ = 4 first: $1 = 1 second: $1 = 1 after: $1 = 1

      At least it is not the case that once a capture variable is set it can't go away by going out of scope. Not only does $1 of the inner scope go away after the loop is completed, but the previous value of the outer scope's $1 remains, unaltered by $1 having been set in the inner scope. And the $1 of the inner scope doesn't even persist from one iteration of the loop to the next. While this effect could be achieved by localization of a global variable, perlre says explicitly that the numbered match variables are dynamically scoped. And my understanding of dynamically scoped is that they are lexical variables, not localized global variables. Of course, I could be was wrong and confused dynamic and lexical scoping. , as could the documentation. I haven't looked at the implementation.

        Maybe I'm misunderstanding what you're trying to say in your last paragraph, but localizing a variable *is* dynamic scoping. my is lexical scoping.

        Note that if you use while things get hairier:

        my $i=1; $i=~/(.)/; while ($i++<=4) { print "\$i = $i\n"; print "first: \$1 = $1\n"; $i=~/(3)/; print "second: \$1 = $1\n"; } print "after: \$1 = $1\n";
        Gives
        $i = 2 first: $1 = 1 second: $1 = 1 $i = 3 first: $1 = 1 second: $1 = 3 $i = 4 first: $1 = 3 second: $1 = 3 after: $1 = 1

        Clearly there's some dynamic scoping going on, which is likely to save a lot of people from small mistakes. But in my example, it's all one block of code, so the inner result persists. I'm pretty sure it's easy to come up with lots of real-world examples of iterating with while and a capturing regex...so, caveat programmer.