in reply to Re: Likely trivial regex question
in thread Likely trivial regex question

For example I thought this would be it
$re = qr/beer=(\d{2}).*(vodka=(\d{2})){0,1}.*chips=(\d{3})/;

but it doesn't capture $2,$3 in str1 (why?).

Replies are listed 'Best First'.
Re^3: Likely trivial regex question
by graff (Chancellor) on Nov 09, 2011 at 09:54 UTC
    Well, first off, by putting parens around "vodka=(\d{2})", you've created another capture, but the approach that might have been "correct" also doesn't work:
    my $re = qr/beer=(\d{2}).*(?:vodka=(\d{2}))?.*chips=(\d{3})/;
    That "matches" both strings, but doesn't capture the number that follows "vodka" in the first string. Sorry, but I can't explain why.

    So I'd be inclined to take a different approach:

    #!/usr/bin/perl use strict; use warnings; my @strs = ( "beer=10&otherstuff&vodka=20&otherstuff&chips=100", "beer=10&otherstuff&juice=20&otherstuff&chips=100" ); my @targets = qw/beer vodka chips/; for ( @strs ) { my @matched; for my $target ( @targets ) { push @matched, $1 if ( /$target=(\d+)/ ); } if ( @matched ) { print "matched: @matched\n"; } }
    UPDATE: Here's a variant on that approach, which I think is closer to what your own snippet would actually do if it worked (in case that's really what you want):
    my @targets = qw/beer=(\d{2}) (vodka=(\d{2})) chips=(\d{3})/; for ( @strs ) { my @matched; for my $target ( @targets ) { push @matched, ( /$target/ ); } if ( @matched ) { print "matched: @matched\n"; } }
      Hi graff,

      thanks for your efforts. As it happens I'm particularly interested in the part about which you wrote that you cannot explain it either (optional grouping not captured).

      As a workaround I've written to regexes coupled with if:
      my $re1 = qr/beer=(\d{2}).*chips=(\d{3})/; my $re2 = qr/vodka=(\d{2})/; if ( $str =~ /$re1 ) { my $m1 = $1; my $m2 = $2; if ( $str =~ /$re2/ ) { #use that capture and go on... } }


      This does exactly what I want it to do; it's just that I'd like to learn if it's possible to achieve with a single regex. And if not, then why?

      While reading up and googling this I realized that I basically do not understand much about greedy/non-greedy quantifier and/or optional groups.

      For example I am also puzzled why
      "cat:dog" =~ /(cat)*/;
      captures "cat", but
      "dog:cat" =~ /(cat)*/;
      doesn't.

      Cheers
        Because in the case of "dog:cat" =~ /(cat)*/;, the "*" quantifies the match to "zero or more" and the first match (at the leading "dog") is "cat" zero times.

        Here's an illustration that beats to death that aspect of your issue.

        #!/usr/bin/perl use Modern::Perl; my $str0 = "0 dog:cat" =~ /(cat)*/; say "\$str0 $str0"; # 1 -- original replaced by scal +ar value my $str1 = "1 dog:cat"; say "\$str1: $str1"; if ($str1 =~ /(cat)*/ ) { my $capture = $1; say "matched |$capture| in \$str1 ($str1) using * quantifier in re +gex"; # see output: uninit $capture }else{ say "no match in \$str1 ($str1) using regex with * quantifier"; } my $str2 = "2 dog:cat"; if ($str2 =~ /(cat)/ ) { my $capture = $1; say "matched |$capture| in \$str2 ($str2) using regex without quan +tifier"; }else{ say "no match in \$str2 ($str2) using regex without quantifier"; } say "-" x10; my $str3 = "3 cat:dog" =~ /(cat)/; say "\$str3: $str3"; # 1 -- original replaced by scal +ar value my $str4 = "4 cat:dog"; if ($str4 =~ /(cat)/ ) { my $capture = $1; say "matched |$capture| in \$str4 ($str4) using regex without quan +tifier"; }else{ say "fubar on |$str4| using regex without quantifier"; } =head $str0 1 $str1: 1 dog:cat Use of uninitialized value $capture in concatenation (.) or string at +F:\_wo\junk20111109.pl line 11. matched || in $str1 (1 dog:cat) using * quantifier in regex matched |cat| in $str2 (2 dog:cat) using regex without quantifier ---------- $str3: 1 matched |cat| in $str4 (4 cat:dog) using regex without quantifier =cut
        (This in no way deprecates choroba's discussion but is offered in the hope that the simpler example here may be more accessible).
        See my reply to graff: Re^4: Likely trivial regex question. To clarify your examples with "cat", use the /g modifier in list context to understand:
        perl -E 'say for "cat:dog" =~ /(cat)*/g;say "=";say for "dog:cat" =~ / +(cat)*/g'
      Sorry, but I can't explain why.
      Because the .* before "vodka" eats "vodka" as well, as you can easily see if you add () around .*:
      my $re = qr/beer=(\d{2})(.*)(?:vodka=(\d{2}))?(.*)chips=(\d{3})/; my $str1 = "beer=10&otherstuff&vodka=20&otherstuff&chips=100"; my $str2 = "beer=10&otherstuff&juice=20&otherstuff&chips=100"; say for $str1 =~ /$re/g; say "=="; say for $str2 =~ /$re/g;
      Changing the first .* to .*? does not help, either, because then the second .* eats "vodka".
Re^3: Likely trivial regex question
by vinian (Beadle) on Nov 11, 2011 at 14:48 UTC
    hi, you can add use re 'debug'; and see the output what happen when the string match.