Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Am I missing something here?
#!/usr/bin/perl -w use strict; while(<DATA>){ /^(\w+)\s+(\w+)$/; print "<$1><$2>\n"; } __DATA__ abc def xyz 6-7 123 456

Output:

<abc><def> <a><d> <123><456>

The second line of data won't match so shouldn't $1 and $2 be either undefined in that case (dynamically scoped to the end of their enclosing block or the next successful match as stated in perlre), or maybe retain their previous values. Why do they contain partial results? (same result for 5.6.1 and 5.8.0.RC2)

Replies are listed 'Best First'.
Re: partial results in $1 $2 after re failure
by blakem (Monsignor) on Jun 30, 2002 at 09:01 UTC
    Looks like a perl bug to me.... If you flatten out the while() loop:
    #!/usr/bin/perl -w use strict; $_ = "abc def\n"; /^(\w+)\s+(\w+)$/; print "<$1><$2>\n"; $_ = "xyz 6-7\n"; /^(\w+)\s+(\w+)$/; print "<$1><$2>\n"; $_ = "123 456\n"; /^(\w+)\s+(\w+)$/; print "<$1><$2>\n";
    The output is
    <abc><def> <abc><def> <123><456>
    Which is what the expected output of your snippet should be. FWIW, replacing the while(<DATA>) with a for(@data) type construct, also producted erroneous results.

    Before using $1 et all, you really should check to be sure your match succeeded:

    while (<DATA>) { if (/^(\w+)\s+(\w+)$/) { print "<$1><$2>\n"; } else { print "Line $. didn't match\n"; } }

    -Blake

Re: partial results in $1 $2 after re failure
by Aristotle (Chancellor) on Jun 30, 2002 at 11:05 UTC
    Seems to be a bug, however, explicitly capturing works:
    #!/usr/bin/perl -w use strict; while(<DATA>){ my @w = /^(\w+)\s+(\w+)$/; print "<$w[0]><$w[1]>\n"; } __DATA__ abc def xyz 6-7 123 456
    gives
    <abc><def> Use of uninitialized value in concatenation (.) or string at x line 5, + <DATA> line 2. Use of uninitialized value in concatenation (.) or string at x line 5, + <DATA> line 2. <><> <123><456>
    ____________
    Makeshifts last the longest.
Re: partial results in $1 $2 after re failure
by Zaxo (Archbishop) on Jun 30, 2002 at 08:33 UTC

    '-' does not match \w.

    Update: AM, you're absolutely right, I missed the point. I get the same results. I suspect that the content of the loop is acting as one dynamic scope, but I have no idea why $1, $2 are losing string length information. Seems like a perl bug to me.

    After Compline,
    Zaxo

      '-' does not match \w.

      Exactly, the failure of the second line is intentional in order to show that partial previous results for $1 and $2 were still printed. Shouldn't the output for the failing line have been either <><> or <abc><def>?

Re: partial results in $1 $2 after re failure
by Anonymous Monk on Jul 01, 2002 at 03:38 UTC
    Try using /^(\S+)\s+(\S+)$/ or for a little less broad of a match, try /^(\w+)\s+(\w[-|\w]\w)$/.

    For some reason it seems to offset what it returns by the length of the string to the left (ie xyz). If you add another character, it will return an "e" rather than a "d". Whereas if you remove a character it returns the " ".

    Wierd stuff.
Re: partial results in $1 $2 after re failure
by ides (Deacon) on Jun 30, 2002 at 15:55 UTC
    As I was investigating this I found that if you leave out the '$' in the regex the script works "more correctly". Since '-' isn't caught by \w the output is as follows

    <abc><def> <xyz><6> <123><456>

    -----------------------------------
    Frank Wiles <frank@wiles.org>
    http://frank.wiles.org

      That's not "more correctly" because with your version of the regex, the second line is an actual match. Obviously then we do get that match's results, which is entirely expected behaviour and therefor doesn't help us here.

      Makeshifts last the longest.

Re: partial results in $1 $2 after re failure
by caedes (Pilgrim) on Jun 30, 2002 at 08:52 UTC
    I think this calls into question the exact meaning of 'dynamically scoped.'

    -caedes