kepler has asked for the wisdom of the Perl Monks concerning the following question:

Good morning,

I have this small question: I'm trying to extract a match from a text of the type "expression1 expression2 expression3".

I'm doing sucessfully this with an array. @source = $text =~ m/(regex1)(regex2)(regex3)/g; The values are kept in $source[0],$source[1] and $source[2].

The problem is that expression2 might repeat itself 1, 2 or more times. So I'm using @source = $text =~ m/(regex1)(regex2)+(regex3)/g; But when I try to access $sources[1], I get the last match of the possible repetitions of expression2, not all... I've tryed a simple print "$sources[1]" and even a "$sources[1][0] sources[1][1]" Can someone please tell me what I'm doing - obviously - wrong?

Kind regards,

Kepler

Replies are listed 'Best First'.
Re: Regex troubles...
by choroba (Cardinal) on Apr 20, 2016 at 11:18 UTC
    That's how repeated captures work. $2 indicates the match that starts at the second group, it can't populate $3 (maybe it should return an array reference?)

    I'd solve this in two steps. In the first one, match the whole repetition, than split it into single expressions:

    my @source = $text =~ /(regex1)((?:regex2)+)(regex3)/g; my @repeated = $source[1] =~ /.../g; # or split or whatever

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
      Hi, Thank you very much for the solution - I've been programming in Perl for some years and I must confess that I wasn't aware of this important detail in regexs that you kindly provided. Kind regards, Kepler
        You can also store the matching parts in an array in a (?{}) expression:
        #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; for my $string (qw( ab1b2b3d ab1b2x )) { my @two; my @matches = $string =~ /(a) (b.(?{ push @two, $2 if defined $2 }))+ (d)/x; say for @matches, '---', @two, '======'; }

        ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: Regex troubles...
by LanX (Saint) on Apr 20, 2016 at 11:22 UTC
    it's possible to hack a recursive regex somehow but it's much easier to put the parens around the quantifier and split the result again.

      ((?:regex2)+)

    HTH :)

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Je suis Charlie!

Re: Regex troubles...
by AnomalousMonk (Archbishop) on Apr 23, 2016 at 22:50 UTC

    Just for grins, here's another approach, although I would still recommend the two-step approach outlined above:

    c:\@Work\Perl>perl -wMstrict -le "use Data::Dump qw(pp); ;; print qq{Perl version: $]}; ;; my $ra = qr{ a }xms; my $rb = qr{ b. }xms; my $rc = qr{ c }xms; for my $string (qw(ab1b2b3b4c ab5b6b7c ab8b9c abxc ac b0)) { my @all = $string =~ m{ \G (?: $ra (?= $rb) | $rb (?= $rb | $rc) | $rc (?= \z) ) }xmsg; print qq{'$string' -> }, pp \@all; } " Perl version: 5.008009 'ab1b2b3b4c' -> ["a", "b1", "b2", "b3", "b4", "c"] 'ab5b6b7c' -> ["a", "b5", "b6", "b7", "c"] 'ab8b9c' -> ["a", "b8", "b9", "c"] 'abxc' -> ["a", "bx", "c"] 'ac' -> [] 'b0' -> []
    Note that this runs under 5.8.9.

Re: Regex troubles...
by Anonymous Monk on Apr 20, 2016 at 13:50 UTC
      Please use code tags when providing source here   <code>

      then please note that this regex is matching more cases because order constrains are lost!

      Cheers Rolf
      (addicted to the Perl Programming Language and ☆☆☆☆ :)
      Je suis Charlie!

      Interesting this ideone if it really lets you run actual perl code, anyway, its best in the future if you also post the code here in code tags, here is the output on my machine

      #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $text = 'XYYYZ'; my @source; push @{$source[$1 ? 0 : $2 ? 1 : 2]}, $& while $text =~ /(X)|(Y)|(Z)/g +; print Dumper \@source; __END__ $VAR1 = [ [ 'X' ], [ 'Y', 'Y', 'Y' ], [ 'Z' ] ];

      update: yeah wrong window, pasted the wrong ideone, fixed

Re: Regex troubles...
by polettix (Vicar) on Apr 24, 2016 at 22:16 UTC
    Hack using (?{ code }), which is not experimental any more at least as of 5.22(.0+) judging from the docs, and was available at least as of 5.8(.8+), although I don't know how stable it has been in time:
    #!/usr/bin/env perl use strict; use warnings; use English qw( -no_match_vars ); use Data::Dumper; my $string = 'foobarbaaaarbaz'; my @second; if (my @matches = $string =~ m{\A (fo*) (?: (?<BAR>\s*ba+r) (?{push @second, $^N}))+ (\s*ba*z) \z}mxs ) { $matches[1] = \@second; print Dumper \@matches; } # $VAR1 = [ # 'foo', # [ # 'bar', # 'baaaar' # ], # 'baz' # ];

    Update: as noted below in a comment, the capture BAR is not needed. I added it while playing with %- and %+, and forgot to remove it before posting the example... I'm leaving it though, so that the comment below can still make sense. Anyway, always remember the old adage about regular expressions...

    perl -ple'$_=reverse' <<<ti.xittelop@oivalf

    Io ho capito... ma tu che hai detto?
      (?: (?<BAR>\s*ba+r) (?{push @second, $^N}))+

      What is the advantage of using the  (?<BAR>\s*ba+r) named capture group rather than an ordinary capture group? The name is never referred to anywhere.


      Give a man a fish:  <%-{-{-{-<