Polynomial has asked for the wisdom of the Perl Monks concerning the following question:

Suppose I use an s/// to alter a string but also to capture some substrings en passant. Suppose I have lots of pairs of capturing parentheses, so in order to put each substring into its own scalar I have to do something like my ($alpha, $beta, $gamma, $delta, $epsilon, $zeta) = ($1, $2, $3, $4, $5, $6);. But that seems redundant, as well as a little dangerous, since every time I add or remove a pair of capturing parentheses I have to remember not only to add or remove an lvalue but also to resize the list of rvalues.

What I'd really like to be able to do is something like my ($alpha, $beta, $gamma, $delta, $epsilon, $zeta) = @SOMEVAR;, where @SOMEVAR is one of those obscure punctuation variables. Except no such variable seems to exist. I can imagine various hacks to simulate one, like using symbolic references to collect $1 through ${@+}, and I can imagine workarounds, like using named captures. But I think there really ought to be a simple, fast, unlikely-to-break way to get all the (anonymous) captures from the last successful match as a list. Anybody know of one?

(Note that since I'm using s/// instead of m//, the return value of the pattern-matching operation isn't helpful.)

  • Comment on Get all captured substrings after a substitution

Replies are listed 'Best First'.
Re: Get all captured substrings after a substitution
by Fletch (Bishop) on Aug 13, 2009 at 14:34 UTC

    You've got the offsets you need in @- and @+, you just need to use them to pull it out.

    my @matches = do { local $str = $_; map { substr( $str, $-[$_], $+[$_] + - $-[$_] ) } 1..$#-; }

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

      Rather than using substr, eval might be a more straightforward way (although admittedly somewhat hackish by the standard of the OP) to pull out the contents of substitution capture groups to either an array or a set of lexical variables. The die statement serves as a (perhaps half-assed) check that the number of captures is as expected.
      >perl -wMstrict -le "sub do_something_with { print q{in do_something_with: '}, join(q{' '}, @_), q{'}; return join '', reverse @_; } my $s = 'abc ABC abc'; $s =~ s{ (a)(b)(c) } { die qq{wrong number ($#-) of captures} if $#- != 3; my ($aye, $bee, $cee) = map eval qq{\$$_}, 1 .. $#-; do_something_with($aye, $bee, $cee); }xmsige; print $s; " in do_something_with: 'a' 'b' 'c' in do_something_with: 'A' 'B' 'C' in do_something_with: 'a' 'b' 'c' cba CBA cba
      Note that every capture group defined in the regex is always represented in  @- even if it does not actually match anything. See what happens if the regex is changed to
          $s =~ s{ (a)(b)(c)(d)? }{ ... }xmsige;
      with appropriate adjustment of the number against which  $#- is checked in the die statement.
Re: Get all captured substrings after a substitution
by kennethk (Abbot) on Aug 13, 2009 at 14:34 UTC
    One possibility which is admittedly ugly but should be reasonably futureproofed is using @+ and @- in combination with substr to pull the values out the original string; as quoted from @ :

    After a match against some variable $var: * $` is the same as substr($var, 0, $-[0]) * $& is the same as substr($var, $-[0], $+[0] - $-[0]) * $' is the same as substr($var, $+[0]) * $1 is the same as substr($var, $-[1], $+[1] - $-[1]) * $2 is the same as substr($var, $-[2], $+[2] - $-[2]) * $3 is the same as substr($var, $-[3], $+[3] - $-[3])
Re: Get all captured substrings after a substitution
by ikegami (Patriarch) on Aug 13, 2009 at 14:42 UTC
    my ($alpha, $beta, $gamma, $delta, $epsilon, $zeta) = $s =~ /.../ or die("No match\n");

    That's an awful lot of vars, though. It seems an array would be better.

    Update: Missed that you were using s///. Capturing and replacing at the same time seems rather odd. What are you doing?

Re: Get all captured substrings after a substitution
by Limbic~Region (Chancellor) on Aug 13, 2009 at 18:02 UTC
    Polynomial,
    With Perl 5.10.x, there is support for named captures. If you can't guarantee perl 5.10.x, then there is Regexp::NamedCaptures by diotalevi but I have no idea how stable it is. I know you said anonymous but doesn't your problem go away if you no longer treat them that way?

    Cheers - L~R

      Turns out I actually used m//g, not s///; goodness knows why I thought otherwise. And it does look like the best solution for my situation is indeed to use named captures, since I'm running perl 5.10.0. I just have to make sure to copy %+ into a lexical hash so all the captures remain accessible after I've used another regex. In future cases where I really do want seperate lexicals as opposed to a hash (and hence anonymous as opposed to named captures), I guess I can use @+ and @- with substr or eval, as has been suggested; I'd just want to benchmark each method to see which is fastest. Thank you, kind monks.