bdalzell has asked for the wisdom of the Perl Monks concerning the following question:

This is a simplified version of my problem:

#!/usr/bin/perl $line="One two three four five"; if($line =~ m/(one)\s(two)\s(three)\s(four)\s(five)/i){ print "1: $1\n"; print "2: $2\n"; print "3: $3\n"; print "4: $4\n"; print "5: $5\n"; }#if

produces the expected:

[me]$>perl -w regexp-test2.pl 1: One 2: two 3: three 4: four 5: five

BUT if you try to do anything to one of the returned values such as this:

#!/usr/bin/perl $line="One two three four five"; if($line =~ m/(one)\s(two)\s(three)\s(four)\s(five)/i){ $one=$1; $one =~ s/one/ONE/i; print "1: $one\n"; print "2: $2\n"; print "3: $3\n"; print "4: $4\n"; print "5: $5\n"; }#if

you loose the other returned values:

[me]$>perl -w regexp-test3.pl 1: ONE Use of uninitialized value $2 in concatenation (.) or string at regexp +-test2.pl line 12. 2: Use of uninitialized value $3 in concatenation (.) or string at regexp +-test2.pl line 13. 3: Use of uninitialized value $4 in concatenation (.) or string at regexp +-test2.pl line 14. 4: Use of uninitialized value $5 in concatenation (.) or string at regexp +-test2.pl line 15. 5:

Can any one explain to me why this happens?

As far as my program is concerned waiting until all the returned values are assigned and then processing things works but I do not understand why doing something to a variable that has been assigned a value from $1 results in loosing the other returned values.

Replies are listed 'Best First'.
Re: Regexp and substitution question
by AnomalousMonk (Archbishop) on Aug 28, 2010 at 03:36 UTC

    Quoth perlre (see section on Capture buffers):

    The numbered match variables ($1, $2, $3, etc.) and the related punctuation set ($+, $&, $`, $', and $^N) are all dynamically scoped until the end of the enclosing block or until the next successful match, whichever comes first. (See "Compound Statements" in perlsyn.)

    Update: Try this with and without the  { } block enclosing the substitution statement (Update: changed example code to try to clarify the point a bit):

    >perl -wMstrict -le "my $line = 'One two three four five'; if ($line =~ m/(one)\s(two)\s(three)\s(four)\s(five)/i){ my $neo; { ($neo = $1) =~ s{ (o)(ne) }{$2$1}xmsi; } print qq{neo '$neo'}; print qq{1: $1}; print qq{2: $2}; print qq{3: $3}; print qq{4: $4}; print qq{5: $5}; } " neo 'neO' 1: One 2: two 3: three 4: four 5: five
Re: Regexp and substitution question
by ikegami (Patriarch) on Aug 28, 2010 at 03:41 UTC
    $1 represent the captures of the last successful pattern match. In this instance, that would be s/one/ONE/. Since it has no captures, $1 and friends are empty. Copy $1 and friends when they still contain what you want, or avoid using $1 and friends completely.
    my $line = "One two three four five"; if (my @captures = $line =~ m/(one)\s(two)\s(three)\s(four)\s(five)/i) + { $captures[0] =~ s/one/ONE/i; print "$_: $captures[$_-1]\n" for 1..5; }

    Note that uc might be more appropriate here. Any maybe split.

    my $line = "One two three four five"; if (my @captures = split(' ', $line)) { $captures[0] = uc($captures[0]); print "$_: $captures[$_-1]\n" for 1..5; }

      Thank you, every one who took the time to answer this. The exercise of writing the request resulted in my realizing when I awoke this AM that the $1, etc involved the most recent match BUT the explanations you provided are instructive beyond the answer to my query and I have learned something that will further improve my program.

Re: Regexp and substitution question
by toolic (Bishop) on Aug 28, 2010 at 03:46 UTC
    The substitution operator will clear all match variables upon a successful pattern match. From perlre (emphasis mine):
    The numbered match variables ($1, $2, $3, etc.) and the related punctuation set ($+ , $& , $` , $' , and $^N ) are all dynamically scoped until the end of the enclosing block or until the next successful match, whichever comes first.

    If the match had failed, and the substitution had not been performed, the match variables retain their values:

    my $line="One two three four five"; if($line =~ m/(one)\s(two)\s(three)\s(four)\s(five)/i){ my $one=$1; $one =~ s/ten/ONE/i; # <--- no match print "1: $one\n"; print "2: $2\n"; print "3: $3\n"; print "4: $4\n"; print "5: $5\n"; }#if __END__ 1: One 2: two 3: three 4: four 5: five
Re: Regexp and substitution question
by oko1 (Deacon) on Aug 28, 2010 at 15:40 UTC

    As other people have noted, a subsequent successful match resets the captured values. However, this doesn't have to create a problem; you can use the list behavior of captures in a regex to save them for reuse.

    #!/usr/bin/perl -w use strict; my $line="One two three four five"; my @chunks; if(@chunks = $line =~ m/(one)\s(two)\s(three)\s(four)\s(five)/i){ $chunks[0] =~ s/One/ONE/; # Change the first chunk $chunks[1] =~ s/two/tWo/; # Change the second chunk # ... } print "@chunks\n";

    Output:

    ONE tWo three four five

    --
    "Language shapes the way we think, and determines what we can think about."
    -- B. L. Whorf
      Absolutely - I can't emphasize enough how important it is to get in the habit of explicitly capturing your matches. It's easy to get lazy about using the values right away, and then your program develops mysterious bugs when someone else adds a "harmless" function call that invokes something else that internally does a pattern match, but only sometimes...

      I also recommend getting into the habit of naming your captures; that way it's much clearer: "now does $capture[4] contain the right value, or was it $capture[3]?". If you do this:

      my($name, $age, $weight) = ($line =~ /$pattern/);
      you'll be much less likely to set someone's age to 180 and their weight to 40.