sulfericacid has asked for the wisdom of the Perl Monks concerning the following question:

I started rereading Learning Perl 3 again (though I still must say I'm not impressed with the book in its entirety) and came across a few questions regarding chapters 9 and 10. This is where they start talking about using backreferences and memory variables.

The book gives the examples below but I don't understand what the number in the \ / is doing at the end and how it's changing the results. Can someone (in different words than the book describes) explain what the purpose of this number is and how it's effecting the two lines below?

I understand the number is the reference to the block of parenthesis in chronological order. \1/ takes the data stored in (fred|wilma) which is either fred or wilma. But outside of that, I am really confused with this.

/((fred|wilma) (flintstone) \2/ could be fred flinstone fred /((fred|wilma) (flintstone) \1/ could be fred flinstone fred flinstone
In chapter 10 it goes back into memory variables using $1..whatever which I understand, but it no longer uses the numbers in \1/ or \2/ as it did earlier. Why is that?
if ($wilma =~ /(\w+)/) { print "Wilma's word was $1"; }
Thanks for your all your wisdom!

"Age is nothing more than an inaccurate number bestowed upon us at birth as just another means for others to judge and classify us"

sulfericacid

Replies are listed 'Best First'.
Re: memory variables (chp. 9 & 10 Learning Perl 3)
by demerphq (Chancellor) on Sep 17, 2003 at 22:32 UTC

    First off I think youv'e mistyped that code. Im pretty sure that both those regexes would error out. Assuming that they should actualy have the first unmatched "(" removed then the regexes dont match the explanation next to them, so assuming youve written them verbatim its not hard to see why you are confused. :-) Lets look at them again (ignoring whitspace in the description):

    /(fred|wilma) (flintstone) \2/ # match "fred" or "wilma" and put them in bucket 1 # followed by "flintstone" and put that in bucket 2 # then match whatever is in bucket 2. Which in this # case must be "flintstone" so the regex is # equivelent to # /(fred|wilma) (flintstone) flintsone/ /(fred|wilma) (flintstone) \1/ # match "fred" or "wilma" and put them in bucket 1 # followed by "flintstone" and put that in bucket 2 # then match whatever is in bucket 1. Which in this # case could be "fred" or "wilma" so the regex is # equivelent to one of the following: # /(fred) (flintstone) fred/ # /(wilma) (flintstone) wilma/

    In chapter 10 it goes back into memory variables using $1..whatever which I understand, but it no longer uses the numbers in \1/ or \2/ as it did earlier. Why is that?

    The thing to remember with using backreferences is that unlike using the capture after the match has occured, the contents of the capture is used as part of the pattern, and is evaluated before the entire regex has completed. This means we can say match AXAZ or BXBZ as /(A|B)X\1Z/ instead of /(AXA|BXB)Z/. $1 is the pattern captured by the last successful match, not a capture from this match.

    #!perl -l print $_="the thing that that thing does"; /(\w+) that/ and printf '$&=%-20s $1=%-10s %s',$&,$1,$/; /($1) (\w+)/ and printf '$&=%-20s $1=%-10s $2=%-10s %s',$&,$1,$2,$/ +; /$1 (\w+) (\1)/ and printf '$&=%-20s $1=%-10s $2=%-10s %s',$&,$1,$2,$/ +; __END__ the thing that that thing does $&=thing that $1=thing $&=thing that $1=thing $2=that $&=thing that that $1=that $2=that

    As we can see, first me do a "normal" match. We grab the word in from of "that" and put it in $1. We then match gain, but now we are going to find the word following what is now in $1 ("thing"), and just to be crafty we put whatever matches $1 into the capture bucket 1, the word that follows it goes into bucket 2. We then match again, this time we match the contents of $1 followed by a word which we capture into bucket 1, and then we capture whatever is in bucket one again and put it into bucket 2. So you can use $1 and \1 in the same regex and they mean _very_ different things.

    So a capture from the previous match can be used in a new match. But we also need to be able to talk about captures from this match and use them in the pattern as well. So basically the \1 etc are captures from this match before the entire match is completed (it may fail later on in the pattern but by that time the \1 will mean whatever _might_ have been captured had the overall pattern succeded).

    HTH


    ---
    demerphq

    <Elian> And I do take a kind of perverse pleasure in having an OO assembly language...
Re: memory variables (chp. 9 & 10 Learning Perl 3)
by Paladin (Vicar) on Sep 17, 2003 at 22:06 UTC
    First, the backreference is just \1; the last / is part of the m// or s/// operator.

    As for the difference between \1 and $1, the \1 is used within the same regex as the capturing parens, while the $1 is used outside the regex.

    For example, in:

    if (m/(\w+)\1/) { print "Found $1 twice\n"; }
    the \1 matches whatever was matched in the () within the same regex, and the $1 refers to the same matched item, but outside the regex.

    Hope this clears it up a bit.

Re: memory variables (chp. 9 & 10 Learning Perl 3)
by Cody Pendant (Prior) on Sep 18, 2003 at 05:47 UTC
    I don't think anyone has pointed out something which I think Sulferic was confused about, that the \1, \2 etc can overlap when there are brackets inside brackets.

    Here's an example:

    $string = 'my name is mister smith'; if($string =~ /(mr\.|mister (\w+))/){ # if the string contains either 'mr.' or # 'mister', then a space, then some word chars # matching for instance 'mister jones' but not # 'hey mister!' print "surname is $2"; # $1 contains the whole thing, $2 just the surname }

    I came up with this one for an auto-html syntax thing.

    Paste the URL and put optional link text after it in brackets.

    $text='Learning Perl? http://www.perlmonks.org/ <this> is the place'; $text =~ s!(http://\S+) (\s+<([^>]+)>)? ! defined($3)? qq(<a href="$1">$3</a>): qq(<a href="$1">$1</a>) !gex; print $text;


    ($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss') =~y~b-v~a-z~s; print
Re: memory variables (chp. 9 & 10 Learning Perl 3)
by mooseboy (Pilgrim) on Sep 18, 2003 at 03:34 UTC

    As others have said, \1, \2 etc are used inside the pattern, and $1, $2 outside it. But I'm pretty sure of two things: a) you're missing a closing paren in the examples and b) you've reproduced what I assume is a typo in the book (p111 in my copy), where it says 'flintsone' instead of 'flintstone' (no doubt merlyn will correct me if I'm wrong in my assumption ;-). So I presume it should really be:

    /((fred|wilma) (flintstone)) \2/ # could be fred flintstone fred /((fred|wilma) (flintstone)) \1/ # could be fred flintstone fred flint +stone

    HTH, mooseboy

      Assuming that mooseboy is correct, and it is simply that you left out a ")"...
      I do not have a copy of this book on hand, so I do not know the surrounding text, however, this might help to clear things up a bit:
      /((fred|wilma) (flintstone)) \2/ # could be fred flintstone fred /((fred|wilma) (flintstone)) \1/ # could be fred flintstone fred flint +stone /((fred|wilma) (flintstone)) \3/ # could be fred flintstone flintstone
      \1 being what is contained in the first paren encountered (until its' closing paren -which you left out-), in this case will always contain both \2 and \3.
      ((\2)\1(\3))

      cheers, -xtype