moodywoody has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

please consider
my $str1 = "beer=10&otherstuff&vodka=20&otherstuff&chips=100"; my $str2 = "beer=10&otherstuff&juice=20&otherstuff&chips=100"; my $re = qr/beer=(\d{2}).*(vodka=(\d{2})).*chips=(\d{3})/; { $str1 =~ /$re/ ? say "matches" : say "doesn't match"; say "$1 $2 $3 $4 "; } { $str2 =~ /$re/? say "matches" : say "doesn't match"; say "$1 $2 $3 $4 "; }

I would like to modify the regex so that it "matches" both strings.
I know I have to tell this expression that the middle part (the double parens around vodka) is optional, but whatever I've tried with *+? and friends it either makes $re not match str2 or it misses $2, $3 in str1 (not initialized).

Thank you for sharing you knowledge.

Replies are listed 'Best First'.
Re: Likely trivial regex question
by ikegami (Patriarch) on Nov 09, 2011 at 06:19 UTC

    Is that a URL-encoded form?

    use URI qw( ); my %fields = URI->new('?'.$str)->query_form(); say "fields present" if exists($fields{ beer }) && exists($fields{ vodka }) && exists($fields{ chips }); say "matches" if ($fields{ beer } // "") =~ /^(?:[0-9]{2})\z/ && ($fields{ vodka } // "") =~ /^(?:[0-9]{2})\z/ && ($fields{ chips } // "") =~ /^(?:[0-9]{3})\z/;
      No, the original data are tomcat logs with URL requests.
        What you posted aren't logs. You do seem to confirm that these are URL fragments though. Are you saying you don't know how to extract these URLs fragments from the logs?
Re: Likely trivial regex question
by ansh batra (Friar) on Nov 09, 2011 at 06:32 UTC
    my $str1 = "beer=10&otherstuff&vodka=20&otherstuff&chips=100"; my $str2 = "beer=10&otherstuff&juice=20&otherstuff&chips=100"; my $re = qr/beer=(\d{2}).*((vodka|juice)=(\d{2})).*chips=(\d{3})/; { $str1 =~ /$re/ ? print "matches\n" : print "doesn't match\n"; print "$1 $2 $3 $4 \n"; } { $str2 =~ /$re/? print "matches\n" : print "doesn't match\n"; print "$1 $2 $3 $4 \n"; }


    output
    matches 10 vodka=20 vodka 20 matches 10 juice=20 juice 20
    is that what you want???
      For example I thought this would be it
      $re = qr/beer=(\d{2}).*(vodka=(\d{2})){0,1}.*chips=(\d{3})/;

      but it doesn't capture $2,$3 in str1 (why?).
        Well, first off, by putting parens around "vodka=(\d{2})", you've created another capture, but the approach that might have been "correct" also doesn't work:
        my $re = qr/beer=(\d{2}).*(?:vodka=(\d{2}))?.*chips=(\d{3})/;
        That "matches" both strings, but doesn't capture the number that follows "vodka" in the first string. Sorry, but I can't explain why.

        So I'd be inclined to take a different approach:

        #!/usr/bin/perl use strict; use warnings; my @strs = ( "beer=10&otherstuff&vodka=20&otherstuff&chips=100", "beer=10&otherstuff&juice=20&otherstuff&chips=100" ); my @targets = qw/beer vodka chips/; for ( @strs ) { my @matched; for my $target ( @targets ) { push @matched, $1 if ( /$target=(\d+)/ ); } if ( @matched ) { print "matched: @matched\n"; } }
        UPDATE: Here's a variant on that approach, which I think is closer to what your own snippet would actually do if it worked (in case that's really what you want):
        my @targets = qw/beer=(\d{2}) (vodka=(\d{2})) chips=(\d{3})/; for ( @strs ) { my @matched; for my $target ( @targets ) { push @matched, ( /$target/ ); } if ( @matched ) { print "matched: @matched\n"; } }
        hi, you can add use re 'debug'; and see the output what happen when the string match.
      Sorry, the example might have been suboptimal. The "juice" part is never to be captured.

      The string will either contains a vodka=xx part or not. If it is there I the regex like to capture it in $2 and $3. If it isn't there I would like the regex to match (say "matches) and give "unitialized" for $2,$3 but still capture $1 and $4.