Rudif has asked for the wisdom of the Perl Monks concerning the following question:

While practicing my newly acquired wisdom, I tried this
#! perl -w use strict; $|++; my $source = 'blahperlblahjavablah'; my ($f1, $f2) = $source =~ /blah(\w+)blah(\w+)blah/; # first attempt print "#1==$1==$2==\n"; print "#2==$f1==$f2==\n"; $source = 'blahblahblah'; ($f1, $f2) = $source =~ /blah(\w+)blah(\w+)blah/; # second attempt print "#3==$1==$2==\n"; print "#4==$f1==$f2==\n"; printf "#5==%s\n", unpack "H*", "$2";
which produced the following output
H:\devperl\perlmonks>scoping-assign.pl #1==perl==java== #2==perl==java== #3==blah== ava== #4====== #5==00617661
OK, prints #1 and #2 confirm the obvious: the first attempt produces 2 successful matches, reflected both in variables $1, $2 and in my variables $f1, $f2.
The second attempt produces no matches, which is is reflected correctly in my variables $f1, $f2 (shown in #4), while the variables $1, $2 would be unchanged - or so I expected.

But, look, $2 was changed, as shown in #3, and confirmed in #5: the first character was replaced by ascii 0!

Can anyone confirm and explain this? I believed that $1, etc, variables stay unchanged when the second match attempt is not successful. But it seems that they can contain garbage.

Obtained on Win2k, AS perl 5.6.0 build 623.

Rudif

Replies are listed 'Best First'.
Re: Another regex variable puzzle
by archon (Monk) on Mar 03, 2001 at 04:32 UTC
    As noted in Programming Perl in the 'Common Goofs for Novices' section:
    "Not saving $1, $2, and so on, across regular expressions. Remember that every new m/atch or s/ubsti/tute/ will set (or clear, or mangle) your $1, $2... variables, as well as $`, $', and $&..."

    I imagine Perl has to do this in order for backreferences, e.g. /blah(\w+)blah\1/, to work.

Re: Another regex variable puzzle
by japhy (Canon) on Mar 03, 2001 at 21:08 UTC
    This has got to be a bug. You're not going crazy. The $DIGIT variables should NOT get altered if there is no match. This is documented:
    The scope of $<digit> (and $`, $&, and $') extends to the end of the enclosing BLOCK or eval string, or to the next successful pattern match, whichever comes first.
    Here's the odd thing. This program works as expected:
    use strict; my ($f1,$f2); ($f1, $f2) = 'XaaXbbX' =~ /X(\w+)X(\w+)X/; print "\$1 = $1; \$2 = $2\n"; print "\$f1 = $f1; \$f2 = $f2\n"; ($f1, $f2) = 'XXX' =~ /X(\w+)X(\w+)X/; print "\$1 = $1; \$2 = $2\n"; print "\$f1 = $f1; \$f2 = $f2\n"; __END__ $1 = aa; $2 = bb $f1 = aa; $f2 = bb $1 = aa; $2 = bb $f1 = ; $f2 =
    But this program doesn't:
    use strict; my ($f1,$f2); $_ = 'XaaXbbX'; ($f1, $f2) = /X(\w+)X(\w+)X/; # first attempt print "\$1 = $1; \$2 = $2\n"; print "\$f1 = $f1; \$f2 = $f2\n"; $_ = 'XXX'; ($f1, $f2) = /X(\w+)X(\w+)X/; # first attempt print "\$1 = $1; \$2 = $2\n"; print "\$f1 = $f1; \$f2 = $f2\n"; __END__ $1 = aa; $2 = bb $f1 = aa; $f2 = bb $1 = XX; $2 = bb $f1 = ; $f2 =
    Hmm, it seems to have something to do with the variable. Oh, and running use re 'debug' on this code shows that the second regex NEVER GETS DONE (this is a good thing, too, since that second regex demands 5 characters at least, and there are only 3, so Perl knows not to do it).

    HOLY (expletive)! I just uncovered something very bad about Perl. Please watch:

    ($_ = "ABCD") =~ /(..)(..)/; print "$1, $2\n"; $_ = "WXYZ"; print "$1, $2\n"; __END__ AB, CD AB, CD
    That looks fine, right? Now watch THIS:
    () = ($_ = "ABCD") =~ /(..)(..)/; print "$1, $2\n"; $_ = "WXYZ"; print "$1, $2\n"; __END__ AB, CD WX, YZ
    This shows that when you (supposedly) store the returned parenthetical matches from a pattern match, Perl LINKS the digit variables to SECTIONS of the string! This is probably less than good.

    This happens in 5.005_02, as well as 5.6.0. I'll submit a bug report.

    japhy -- Perl and Regex Hacker

      Tested and verified. Apparently, the sections stay static, and if the string becomes shorter, the length of the string, plus one or two characters, overwrite the $<digit> variables. If you regexp again, though, they get reset as they should be. What I tried:
      use re 'debug'; use strict; my (@rere,$rere); $_="hello world"; @rere = /(\w+)\s+(\w+)/; $rere = "(".join(",",@rere).")"; print "\$1 = $1, \$2 = $2, \$rere = $rere\n"; $_="foo bar"; print "\$1 = $1, \$2 = $2, \$rere = $rere\n"; @rere = /(\w+)\s+(\w+)/; $rere = "(".join(",",@rere).")"; print "\$1 = $1, \$2 = $2, \$rere = $rere\n";
      with the following output:
      Compiling REx `(\w+)\s+(\w+)' size 15 first at 4 1: OPEN1(3) 3: PLUS(5) 4: ALNUM(0) 5: CLOSE1(7) 7: PLUS(9) 8: SPACE(0) 9: OPEN2(11) 11: PLUS(13) 12: ALNUM(0) 13: CLOSE2(15) 15: END(0) stclass `ALNUM' plus minlen 3 Compiling REx `(\w+)\s+(\w+)' size 15 first at 4 1: OPEN1(3) 3: PLUS(5) 4: ALNUM(0) 5: CLOSE1(7) 7: PLUS(9) 8: SPACE(0) 9: OPEN2(11) 11: PLUS(13) 12: ALNUM(0) 13: CLOSE2(15) 15: END(0) stclass `ALNUM' plus minlen 3 Matching REx `(\w+)\s+(\w+)' against `hello world' [...] Match successful! $1 = hello, $2 = world, $rere = (hello,world) $1 = foo b, $2 = r rld, $rere = (hello,world) Matching REx `(\w+)\s+(\w+)' against `foo bar' [...] Match successful! $1 = foo, $2 = bar, $rere = (foo,bar) Freeing REx: `(\w+)\s+(\w+)' Freeing REx: `(\w+)\s+(\w+)'

      As for the null character, that might be a byproduct of C(++) string processing -- for those who may not know, in C(++) strings are terminated with a null.

      You're not going crazy.
      Thank you japhy, that's reassuring :-)

      The scope of $<digit> (and $`, $&, and $') extends to the end of the enclosing BLOCK or eval string, or to the next successful pattern match, whichever comes first.

      But it does not say that those vars will not be mangled by a passing-by unsuccessful match :-(

      What you discovered/confirmed (Perl LINKS the digit variables to SECTIONS of the string) should be documented in bold letters right in the perlre page where a perl newbie first meets those variables.Did you submit that bug report?

      It would be good if the behavior of $<digit> variables could be specified and implemented such that at any time they either contain the result of the last successful match or are undefined. In the meantime, I will remember not to rely on them and use the list assignment.

      Rudif
        I reported and fixed the bug. All it took was removing a statement in pp_hot.c that said, in English, "if the pattern match is happening in list context, don't make copies of $1, $2, etc."

        So in the next version of Perl, your bug will be no more. It was a peculiar bug... behavior-wise, at least. I guess they thought that if you're storing the digit variables in other variables, there's no need to make copies of them, but rather, link those digit variables to parts of the string directly.

        japhy -- Perl and Regex Hacker