cfreak has asked for the wisdom of the Perl Monks concerning the following question:

Earlier today I ran across a small problem with a script of mine. I never realized that a parenthetical capture does not reset the $1 ... $n variables, nor does it reset $&. Here's a test case

#!/usr/bin/perl use strict; my $count = 0; while(<DATA>) { $_ =~ /(foo)/; if($1) { print "$count: $1 -- $&\n"; $count ++; } } exit; __DATA__ foo bar baz foobar fjkdlsa jewklf fdlkjsa jfj49i fjdals; foo fjkdsla

When run this outputs:

0: foo -- foo 1: foo -- foo 2: foo -- foo 3: foo -- foo 4: foo -- foo 5: foo -- foo 6: foo -- foo 7: foo -- foo 8: foo -- foo 9: foo -- foo 10: foo -- foo

Not exactly what I was looking for.

So my question is, is this a feature or a bug? Also I didn't find anything in perlre that really warned of this gotcha, if its a feature: my suggestion is that it should be better documented (if it is please show me where! :) )

Replies are listed 'Best First'.
•Re: Regex variable capturing gotcha
by merlyn (Sage) on Aug 04, 2004 at 18:17 UTC
    perldoc perlre:
    The numbered match variables ($1, $2, $3, etc.) and the related + punctu- ation set ($+, $&, $`, $', and $^N) are all dynamically scoped +until the end of the enclosing block or until the next successful mat +ch, whichever comes first. (See "Compound Statements" in perlsyn.) NOTE: failed matches in Perl do not reset the match variables, +which makes easier to write code that tests for a series of more spec +ific cases and remembers the best match.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

Re: Regex variable capturing gotcha
by Aristotle (Chancellor) on Aug 04, 2004 at 18:21 UTC

    Yes, it does.

    $ perl -le'"foo" =~ /(foo)/; print $1; "bar" =~ /(bar)/; print $1;' foo bar

    What you got wrong is that a failed match will not change the values:

    $ perl -le'"foo" =~ /(foo)/; print $1; "bar" =~ /(bar)/; print $1; "ba +z" =~ /(quux)/; print $1;' foo bar bar

    Of course then, if all you're matching for is /(foo)/, and you succeed on the very first match, $1 will never be anything other than foo. What you need to do is check whether your match succeeded:

    while(<DATA>) { if( $_ =~ /(foo)/ ) { print "$.: $1 -- $&\n"; } else { print "No match on line $..\n" } }

    This is documented behaviour.

    As a sidenote, instead of keeping a line counter yourself, you can use Perl's $. variable, which contains the number of the last line read from the last accessed filehandle.

    Makeshifts last the longest.

      As a sidenote, instead of keeping a line counter yourself, you can use Perl's $. variable,
      Good advice, but it looks like it's a match count, not a line count.
Re: Regex variable capturing gotcha
by ccn (Vicar) on Aug 04, 2004 at 18:20 UTC

    It's a feature, see perlretut, perlre

    Your code can be rewritten such way

    while(<DATA>) { if( $_ =~ /(foo)/ ) { print "$count: $1 -- $&\n"; $count ++; } }