in reply to Likely trivial regex question

my $str1 = "beer=10&otherstuff&vodka=20&otherstuff&chips=100"; my $str2 = "beer=10&otherstuff&juice=20&otherstuff&chips=100"; my $re = qr/beer=(\d{2}).*((vodka|juice)=(\d{2})).*chips=(\d{3})/; { $str1 =~ /$re/ ? print "matches\n" : print "doesn't match\n"; print "$1 $2 $3 $4 \n"; } { $str2 =~ /$re/? print "matches\n" : print "doesn't match\n"; print "$1 $2 $3 $4 \n"; }


output
matches 10 vodka=20 vodka 20 matches 10 juice=20 juice 20
is that what you want???

Replies are listed 'Best First'.
Re^2: Likely trivial regex question
by moodywoody (Novice) on Nov 09, 2011 at 07:15 UTC
    For example I thought this would be it
    $re = qr/beer=(\d{2}).*(vodka=(\d{2})){0,1}.*chips=(\d{3})/;

    but it doesn't capture $2,$3 in str1 (why?).
      Well, first off, by putting parens around "vodka=(\d{2})", you've created another capture, but the approach that might have been "correct" also doesn't work:
      my $re = qr/beer=(\d{2}).*(?:vodka=(\d{2}))?.*chips=(\d{3})/;
      That "matches" both strings, but doesn't capture the number that follows "vodka" in the first string. Sorry, but I can't explain why.

      So I'd be inclined to take a different approach:

      #!/usr/bin/perl use strict; use warnings; my @strs = ( "beer=10&otherstuff&vodka=20&otherstuff&chips=100", "beer=10&otherstuff&juice=20&otherstuff&chips=100" ); my @targets = qw/beer vodka chips/; for ( @strs ) { my @matched; for my $target ( @targets ) { push @matched, $1 if ( /$target=(\d+)/ ); } if ( @matched ) { print "matched: @matched\n"; } }
      UPDATE: Here's a variant on that approach, which I think is closer to what your own snippet would actually do if it worked (in case that's really what you want):
      my @targets = qw/beer=(\d{2}) (vodka=(\d{2})) chips=(\d{3})/; for ( @strs ) { my @matched; for my $target ( @targets ) { push @matched, ( /$target/ ); } if ( @matched ) { print "matched: @matched\n"; } }
        Hi graff,

        thanks for your efforts. As it happens I'm particularly interested in the part about which you wrote that you cannot explain it either (optional grouping not captured).

        As a workaround I've written to regexes coupled with if:
        my $re1 = qr/beer=(\d{2}).*chips=(\d{3})/; my $re2 = qr/vodka=(\d{2})/; if ( $str =~ /$re1 ) { my $m1 = $1; my $m2 = $2; if ( $str =~ /$re2/ ) { #use that capture and go on... } }


        This does exactly what I want it to do; it's just that I'd like to learn if it's possible to achieve with a single regex. And if not, then why?

        While reading up and googling this I realized that I basically do not understand much about greedy/non-greedy quantifier and/or optional groups.

        For example I am also puzzled why
        "cat:dog" =~ /(cat)*/;
        captures "cat", but
        "dog:cat" =~ /(cat)*/;
        doesn't.

        Cheers
        Sorry, but I can't explain why.
        Because the .* before "vodka" eats "vodka" as well, as you can easily see if you add () around .*:
        my $re = qr/beer=(\d{2})(.*)(?:vodka=(\d{2}))?(.*)chips=(\d{3})/; my $str1 = "beer=10&otherstuff&vodka=20&otherstuff&chips=100"; my $str2 = "beer=10&otherstuff&juice=20&otherstuff&chips=100"; say for $str1 =~ /$re/g; say "=="; say for $str2 =~ /$re/g;
        Changing the first .* to .*? does not help, either, because then the second .* eats "vodka".
      hi, you can add use re 'debug'; and see the output what happen when the string match.
Re^2: Likely trivial regex question
by moodywoody (Novice) on Nov 09, 2011 at 07:09 UTC
    Sorry, the example might have been suboptimal. The "juice" part is never to be captured.

    The string will either contains a vodka=xx part or not. If it is there I the regex like to capture it in $2 and $3. If it isn't there I would like the regex to match (say "matches) and give "unitialized" for $2,$3 but still capture $1 and $4.