Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

How do I reference repeated capture groups?

by TIOOWTDI (Initiate)
on Aug 12, 2022 at 14:24 UTC ( [id://11146121]=perlquestion: print w/replies, xml ) Need Help??

TIOOWTDI has asked for the wisdom of the Perl Monks concerning the following question:

Esteemed monks

I stumbled over this 2 year old reddit-question from a user "onion" and am not too convinced about the answers given.

Is using Regexp::Grammars really the only way to do it? Seems like overkill...

Suppose I have this regular expression:
my $re = qr{(\w+)(\s*\d+\s*)*}; How do I get every match matched by the second group?
Using the regular numeric variables only gets me the last value matched, not the whole list:
my $re = qr{(\w+)(\s*\d+\s*)*}; my $str = 'a 1 2 3 b 4 5 6'; while ($str =~ /$re/g) { say "$&: $1 $2"; } # output: # a 1 2 3 : a 3 # b 4 5 6: b 6

How do I get every number that follows a letter in this example, and not just the last one?

EDIT

Bonus question:

How do I do it if I have named groups? I.e. my $re = qr{(?<letter>\w+)(?<digit>\s*\d+\s*)*};

Replies are listed 'Best First'.
Re: How do I reference repeated capture groups?
by kcott (Archbishop) on Aug 13, 2022 at 02:40 UTC

    G'day TIOOWTDI,

    Welcome to the Monastery.

    You actually want to capture zero or more instances of '(\s*\d+\s*)'; i.e. '((?:\s*\d+\s*)*)'.

    Your OP code:

    $ perl -E ' my $re = qr{(\w+)(\s*\d+\s*)*}; my $str = "a 1 2 3 b 4 5 6"; while ($str =~ /$re/g) { say "$&: $1 $2"; } ' a 1 2 3 : a 3 b 4 5 6: b 6

    With fixed regex:

    $ perl -E ' my $re = qr{(\w+)((?:\s*\d+\s*)*)}; my $str = "a 1 2 3 b 4 5 6"; while ($str =~ /$re/g) { say "$&: $1 $2"; } ' a 1 2 3 : a 1 2 3 b 4 5 6: b 4 5 6

    Named captures don't change the regex logic. The start of capture groups changes from '(' to '(?<name>'; and, accessing values changes from '$N' to '$+{name}'.

    $ perl -E ' my $re = qr{(?<letter>\w+)(?<digit>(?:\s*\d+\s*)*)}; my $str = "a 1 2 3 b 4 5 6"; while ($str =~ /$re/g) { say "$&: $+{letter} $+{digit}"; } ' a 1 2 3 : a 1 2 3 b 4 5 6: b 4 5 6

    — Ken

      It's not specifically part of your question; however, it occurs to me that you might not want to capture all that excess leading and trailing whitespace. Compare these:

      $ perl -E ' my $re = qr{(\w+)((?:\s*\d+\s*)*)}; my $str = "a 1 2 3 b 4 5 6"; while ($str =~ /$re/g) { say "|$&|: |$1| |$2|"; } ' |a 1 2 3 |: |a| | 1 2 3 | |b 4 5 6|: |b| | 4 5 6|
      $ perl -E ' my $re = qr{(\w+)\s+((?:\s*\d+)*)}; my $str = "a 1 2 3 b 4 5 6"; while ($str =~ /$re/g) { say "|$&|: |$1| |$2|"; } ' |a 1 2 3|: |a| |1 2 3| |b 4 5 6|: |b| |4 5 6|

      — Ken

      from the original discussion on Reddit:

      Okay but I also just want a list of each of the matches so I can parse them separately. Think attributes in html, a tag can have multiple, and being able to handle each individually is useful

      ...

      that still doesn't give me

      (a => [1, 2, 3], b => [4, 5, 6]) where in my analogy the letters are the html tags and the numbers are the attributes

      your answer with "1 2 3" in a string has already been given multiple times.

      BTW: The OP's moniker is Timegazer not Onion

Re: How do I reference repeated capture groups?
by LanX (Saint) on Aug 12, 2022 at 14:56 UTC
        to meet these explicit requirements, here an altered version.

        NB: how similar those solutions look like in the end...

        use v5.12; use warnings; use Data::Dump; my $str = 'a 1 2 3 b 4 5 6'; my $re_start = qr{(?<start>\w+)\s*}; my $re_repeat = qr{\s*(?<repeat>\d+)\s*}; # --- approach 1 with \G and /gc my @res1; while ( $str =~ /\G$re_start/g ) { push @res1, [$+{start}]; while ($str =~ /\G$re_repeat/gc) { push @{ $res1[-1] }, $+{repeat}; } } dd \@res1; # --- approach 2 with embedded (?{CODE}) my @res2; $str =~ / (?: $re_start (?{ push @res2, [$+{start}] }) (?: $re_repeat (?{ push @{ $res2[-1] }, $+{repeat} }) )* )* /gx; dd \@res2;

        output
        [["a", 1, 2, 3], ["b", 4, 5, 6]] [["a", 1, 2, 3], ["b", 4, 5, 6]]

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

Re: How do I reference repeated capture groups?
by Fletch (Bishop) on Aug 12, 2022 at 14:43 UTC

    Is there something wrong with the (obvious?) just surround both groups with another outer set of parens (which then shifts the numerics down by one)? Should work with a named mark as well.

    my $re = qr/(?<whole_enchilada> (?<Letter>\w+) (?<Digit>\s*\d+\s*)* )/ +x;

    Edit: NVM, misread what was being asked. You could wrap the second set of parens outside the quantifier I believe but that's not going to preserve the separate matches. ENOCAFFEINE . . .

    Edit 2: Playing with %- and friends I'm not finding anything obvious. The other suggestion about using minor ebil like embedded code pushing to an array sounds like the most promising.

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

Re: How do I reference repeated capture groups?
by Anonymous Monk on Aug 13, 2022 at 06:06 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11146121]
Approved by marto
Front-paged by kcott
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2024-03-29 02:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found