in reply to Another regex variable puzzle

This has got to be a bug. You're not going crazy. The $DIGIT variables should NOT get altered if there is no match. This is documented:
The scope of $<digit> (and $`, $&, and $') extends to the end of the enclosing BLOCK or eval string, or to the next successful pattern match, whichever comes first.
Here's the odd thing. This program works as expected:
use strict; my ($f1,$f2); ($f1, $f2) = 'XaaXbbX' =~ /X(\w+)X(\w+)X/; print "\$1 = $1; \$2 = $2\n"; print "\$f1 = $f1; \$f2 = $f2\n"; ($f1, $f2) = 'XXX' =~ /X(\w+)X(\w+)X/; print "\$1 = $1; \$2 = $2\n"; print "\$f1 = $f1; \$f2 = $f2\n"; __END__ $1 = aa; $2 = bb $f1 = aa; $f2 = bb $1 = aa; $2 = bb $f1 = ; $f2 =
But this program doesn't:
use strict; my ($f1,$f2); $_ = 'XaaXbbX'; ($f1, $f2) = /X(\w+)X(\w+)X/; # first attempt print "\$1 = $1; \$2 = $2\n"; print "\$f1 = $f1; \$f2 = $f2\n"; $_ = 'XXX'; ($f1, $f2) = /X(\w+)X(\w+)X/; # first attempt print "\$1 = $1; \$2 = $2\n"; print "\$f1 = $f1; \$f2 = $f2\n"; __END__ $1 = aa; $2 = bb $f1 = aa; $f2 = bb $1 = XX; $2 = bb $f1 = ; $f2 =
Hmm, it seems to have something to do with the variable. Oh, and running use re 'debug' on this code shows that the second regex NEVER GETS DONE (this is a good thing, too, since that second regex demands 5 characters at least, and there are only 3, so Perl knows not to do it).

HOLY (expletive)! I just uncovered something very bad about Perl. Please watch:

($_ = "ABCD") =~ /(..)(..)/; print "$1, $2\n"; $_ = "WXYZ"; print "$1, $2\n"; __END__ AB, CD AB, CD
That looks fine, right? Now watch THIS:
() = ($_ = "ABCD") =~ /(..)(..)/; print "$1, $2\n"; $_ = "WXYZ"; print "$1, $2\n"; __END__ AB, CD WX, YZ
This shows that when you (supposedly) store the returned parenthetical matches from a pattern match, Perl LINKS the digit variables to SECTIONS of the string! This is probably less than good.

This happens in 5.005_02, as well as 5.6.0. I'll submit a bug report.

japhy -- Perl and Regex Hacker

Replies are listed 'Best First'.
Re: Re: Another regex variable puzzle
by premchai21 (Curate) on Mar 03, 2001 at 21:47 UTC
    Tested and verified. Apparently, the sections stay static, and if the string becomes shorter, the length of the string, plus one or two characters, overwrite the $<digit> variables. If you regexp again, though, they get reset as they should be. What I tried:
    use re 'debug'; use strict; my (@rere,$rere); $_="hello world"; @rere = /(\w+)\s+(\w+)/; $rere = "(".join(",",@rere).")"; print "\$1 = $1, \$2 = $2, \$rere = $rere\n"; $_="foo bar"; print "\$1 = $1, \$2 = $2, \$rere = $rere\n"; @rere = /(\w+)\s+(\w+)/; $rere = "(".join(",",@rere).")"; print "\$1 = $1, \$2 = $2, \$rere = $rere\n";
    with the following output:
    Compiling REx `(\w+)\s+(\w+)' size 15 first at 4 1: OPEN1(3) 3: PLUS(5) 4: ALNUM(0) 5: CLOSE1(7) 7: PLUS(9) 8: SPACE(0) 9: OPEN2(11) 11: PLUS(13) 12: ALNUM(0) 13: CLOSE2(15) 15: END(0) stclass `ALNUM' plus minlen 3 Compiling REx `(\w+)\s+(\w+)' size 15 first at 4 1: OPEN1(3) 3: PLUS(5) 4: ALNUM(0) 5: CLOSE1(7) 7: PLUS(9) 8: SPACE(0) 9: OPEN2(11) 11: PLUS(13) 12: ALNUM(0) 13: CLOSE2(15) 15: END(0) stclass `ALNUM' plus minlen 3 Matching REx `(\w+)\s+(\w+)' against `hello world' [...] Match successful! $1 = hello, $2 = world, $rere = (hello,world) $1 = foo b, $2 = r rld, $rere = (hello,world) Matching REx `(\w+)\s+(\w+)' against `foo bar' [...] Match successful! $1 = foo, $2 = bar, $rere = (foo,bar) Freeing REx: `(\w+)\s+(\w+)' Freeing REx: `(\w+)\s+(\w+)'

    As for the null character, that might be a byproduct of C(++) string processing -- for those who may not know, in C(++) strings are terminated with a null.

Re: Re: Another regex variable puzzle
by Rudif (Hermit) on Mar 04, 2001 at 03:23 UTC
    You're not going crazy.
    Thank you japhy, that's reassuring :-)

    The scope of $<digit> (and $`, $&, and $') extends to the end of the enclosing BLOCK or eval string, or to the next successful pattern match, whichever comes first.

    But it does not say that those vars will not be mangled by a passing-by unsuccessful match :-(

    What you discovered/confirmed (Perl LINKS the digit variables to SECTIONS of the string) should be documented in bold letters right in the perlre page where a perl newbie first meets those variables.Did you submit that bug report?

    It would be good if the behavior of $<digit> variables could be specified and implemented such that at any time they either contain the result of the last successful match or are undefined. In the meantime, I will remember not to rely on them and use the list assignment.

    Rudif
      I reported and fixed the bug. All it took was removing a statement in pp_hot.c that said, in English, "if the pattern match is happening in list context, don't make copies of $1, $2, etc."

      So in the next version of Perl, your bug will be no more. It was a peculiar bug... behavior-wise, at least. I guess they thought that if you're storing the digit variables in other variables, there's no need to make copies of them, but rather, link those digit variables to parts of the string directly.

      japhy -- Perl and Regex Hacker

        All it took was removing a statement in pp_hot.c

        Thank you for a hot fix :-)

        Rudif