Likely trivial regex question

moodywoody has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Likely trivial regex question by ikegami (Patriarch) on Nov 09, 2011 at 06:19 UTC
Is that a URL-encoded form? `use URI qw( ); my %fields = URI->new('?'.$str)->query_form(); say "fields present" if exists($fields{ beer }) && exists($fields{ vodka }) && exists($fields{ chips }); say "matches" if ($fields{ beer } // "") =~ /^(?:[0-9]{2})\z/ && ($fields{ vodka } // "") =~ /^(?:[0-9]{2})\z/ && ($fields{ chips } // "") =~ /^(?:[0-9]{3})\z/;` [download]	[reply] [d/l]
Re^2: Likely trivial regex question by moodywoody (Novice) on Nov 09, 2011 at 07:04 UTC
No, the original data are tomcat logs with URL requests.	[reply]
Re^3: Likely trivial regex question by ikegami (Patriarch) on Nov 09, 2011 at 07:48 UTC
What you posted aren't logs. You do seem to confirm that these are URL fragments though. Are you saying you don't know how to extract these URLs fragments from the logs?	[reply]
Re^4: Likely trivial regex question by moodywoody (Novice) on Nov 09, 2011 at 08:02 UTC
Re: Likely trivial regex question by ansh batra (Friar) on Nov 09, 2011 at 06:32 UTC
`my $str1 = "beer=10&otherstuff&vodka=20&otherstuff&chips=100"; my $str2 = "beer=10&otherstuff&juice=20&otherstuff&chips=100"; my $re = qr/beer=(\d{2}).((vodka\|juice)=(\d{2})).chips=(\d{3})/; { $str1 =~ /$re/ ? print "matches\n" : print "doesn't match\n"; print "$1 $2 $3 $4 \n"; } { $str2 =~ /$re/? print "matches\n" : print "doesn't match\n"; print "$1 $2 $3 $4 \n"; }` [download] output `matches 10 vodka=20 vodka 20 matches 10 juice=20 juice 20` [download] is that what you want???	[reply] [d/l] [select]
Re^2: Likely trivial regex question by moodywoody (Novice) on Nov 09, 2011 at 07:15 UTC
For example I thought this would be it `$re = qr/beer=(\d{2}).(vodka=(\d{2})){0,1}.chips=(\d{3})/;` [download] but it doesn't capture $2,$3 in str1 (why?).	[reply] [d/l]
Re^3: Likely trivial regex question by graff (Chancellor) on Nov 09, 2011 at 09:54 UTC
Well, first off, by putting parens around "vodka=(\d{2})", you've created another capture, but the approach that might have been "correct" also doesn't work: `my $re = qr/beer=(\d{2}).(?:vodka=(\d{2}))?.chips=(\d{3})/;` [download] That "matches" both strings, but doesn't capture the number that follows "vodka" in the first string. Sorry, but I can't explain why. So I'd be inclined to take a different approach: `#!/usr/bin/perl use strict; use warnings; my @strs = ( "beer=10&otherstuff&vodka=20&otherstuff&chips=100", "beer=10&otherstuff&juice=20&otherstuff&chips=100" ); my @targets = qw/beer vodka chips/; for ( @strs ) { my @matched; for my $target ( @targets ) { push @matched, $1 if ( /$target=(\d+)/ ); } if ( @matched ) { print "matched: @matched\n"; } }` [download] UPDATE: Here's a variant on that approach, which I think is closer to what your own snippet would actually do if it worked (in case that's really what you want): `my @targets = qw/beer=(\d{2}) (vodka=(\d{2})) chips=(\d{3})/; for ( @strs ) { my @matched; for my $target ( @targets ) { push @matched, ( /$target/ ); } if ( @matched ) { print "matched: @matched\n"; } }` [download]	[reply] [d/l] [select]
Re^4: Likely trivial regex question by moodywoody (Novice) on Nov 09, 2011 at 10:35 UTC
Re^5: Likely trivial regex question by ww (Archbishop) on Nov 09, 2011 at 12:38 UTC
Re^5: Likely trivial regex question by choroba (Cardinal) on Nov 09, 2011 at 11:00 UTC
Re^4: Likely trivial regex question by choroba (Cardinal) on Nov 09, 2011 at 10:56 UTC
Re^3: Likely trivial regex question by vinian (Beadle) on Nov 11, 2011 at 14:48 UTC
hi, you can add `use re 'debug';` and see the output what happen when the string match.	[reply] [d/l]
Re^2: Likely trivial regex question by moodywoody (Novice) on Nov 09, 2011 at 07:09 UTC
Sorry, the example might have been suboptimal. The "juice" part is never to be captured. The string will either contains a vodka=xx part or not. If it is there I the regex like to capture it in $2 and $3. If it isn't there I would like the regex to match (say "matches) and give "unitialized" for $2,$3 but still capture $1 and $4.	[reply]