in reply to Finally, a $& compromise!

Again, any regex with capturing parentheses will always have support for $& because of the mechanism that provides $DIGIT variables.

Does this mean that a regexp that captures $1 implies $& which implies the performance hit for maintaining $`, $&, and $' ? I thought that you only took on the performance hit if you explicitly used $`, $&, or $'.

Replies are listed 'Best First'.
Re: Re: Finally, a $& compromise!
by japhy (Canon) on Nov 28, 2001 at 07:38 UTC
    Does this mean that a regexp that captures $1 implies $& which implies the performance hit for maintaining $`, $&, and $' ?
    Only for maintaining them for that regex. The way that $DIGIT variables are supported is thus:
    1. The string being matched against is copied (via savepvn()) to rx->subbeg.
    2. The offsets of the $DIGIT vars are stored in the two arrays rx->startp and rx->endp.
    3. When you access $2, Perl does magic:
      1. It takes the beginning and ending offsets, rx->startp[2] and rx->endp[2], and takes a substring of rx->subbeg.
      2. It savepvn()s (copies) that substring to a scalar and returns it.
    However, this only happens in a regex that has capturing parentheses! If you have a regex that does NOT have capturing parentheses, it does not need to copy the string.

    The $DIGIT vars are like tiny instances of $& that only appear when you need them. $& appears all the time if you use it once. Here's an example that shows that a regex that uses capturing parentheses gives you the ability to use $& and the like. These are three separate programs. I'm using eval '' so that $& isn't seen at the time the regexes are executed.

    #!/usr/bin/perl "simple" =~ /im/ and eval q{ print "<$`><$&><$'>\n" }; ### #!/usr/bin/perl "complex" =~ /.p/ and eval q{ print "<$`><$&><$'>\n" }; #<co><mp><lex> ### #!/usr/bin/perl "capture" =~ /(.t)./ and eval q{ print "<$`><$&><$'>:<$1>\n" }; #<ca><ptu><re>:<pt>
    Does that make sense? In order to have $1, you have to have the string that is also used for $&. From perlre:
    WARNING: Once Perl sees that you need one of $&, $`, or $' anywhere in the program, it has to provide them for every pattern match. This may substantially slow your program. Perl uses the same mechanism to produce $1, $2, etc, so you also pay a price for each pattern that contains capturing parentheses. (To avoid this cost while retaining the grouping behaviour, use the extended regular expression (?: ... ) instead.) But if you never use $&, $` or $', then patterns without capturing parentheses will not be penalized. So avoid $&, $', and $` if you can, but if you can't (and some algorithms really appreciate them), once you've used them once, use them at will, because you've already paid the price. As of 5.005, $& is not so costly as the other two.
    Some of that will be rewritten with the advent of this pragma, though. It's nice to "rewrite the books".

    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker.
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;