My regex is too greedy!

hans_moleman has asked for the wisdom of the Perl Monks concerning the following question:

Greetings all.

I'm writing a script to parse syslog messages from a Cisco VPN Concentrator. Each line contains a number of fields, including a "message field". I'm using regular expressions to grab the data I need from these message fields based on the message type.

In one particular case (Administrative user login if you want to know) the lines look like this:

Mar  3 11:29:11 10.20.20.2 8194 03/03/2003 13:15:37.330 SEV=5 AUTH/36 
+RPT=29  User [ admin ] Protocol [ Telnet ] attempted ADMIN logon.. St
+atus: <ACCESS GRANTED> !
[download]

I want to grab the user name from the message string. After using split() to isolate the different fields I want, I tried using a regular expression to get the name:

$user=~s/.+\[[ ]+(.+)[ ]+\].+/$1/;
[download]

Unfortunately, this regular expression returns the second element between brackets. In the sample line above, it would return "Telnet". I ended up finding a solution using split :

(undef,$user,undef)=split(/\[ | \]/,$message,3);
[download]

Because, as always TMTOWTDI. However I'm curious as to what is wrong with my regular expression. I went to the camel and tried several iterations of the regular expression we see above but no joy. Anyone care to shed some light on the situation?

Comment on My regex is too greedy! Select or Download Code

Replies are listed 'Best First'.
Re: My regex is too greedy! by Mr. Muskrat (Canon) on Mar 05, 2003 at 16:25 UTC
Simplify! `my $message = 'Mar 3 11:29:11 10.20.20.2 8194 03/03/2003 13:15:37.330 + SEV=5 AUTH/36 RPT=29 User [ admin ] Protocol [ Telnet ] attempted A +DMIN logon.. Status: <ACCESS GRANTED> !'; my ($user) = $message =~ /\[\s+(\w+)\s+\]/; print $user,$/;` [download]	[reply] [d/l]
Re: My regex is too greedy! by blokhead (Monsignor) on Mar 05, 2003 at 16:28 UTC
It's your `.+` at the beginning: it will always match the largest possible thing it can. In this case, it can match all the way up to the Protocol section, so it does. Change it to `.+?` and you'll be on the right track. The additional question mark tells the .+ to match the smallest possible thing it can. However, now the capturing `(.+)` will try to grab the largest possible match, so it will end up with "admin `]` Protocol `[` Telnet". I think the negated character class `[^\]]+` would be a better choice (grab as many non-brackets as possible) to match everything inside the brackets without risk of flowing over into the next set of brackets. `s/.+?\[\s+([^\]]+)\s+\].+/$1/;` This works for me and results in 'admin'. Also, as Mr. Muskrat demonstrates, this is not quite the right realm for a s///, you should probably be just using a match and capturing what you want. blokhead	[reply] [d/l] [select]
Re: Re: My regex is too greedy! by Nkuvu (Priest) on Mar 05, 2003 at 17:22 UTC
Would this be a more efficient regex without the `.+?` at the beginning? You aren't, after all, capturing it...	[reply] [d/l]
Re: Re: Re: My regex is too greedy! by blokhead (Monsignor) on Mar 05, 2003 at 18:07 UTC
If we're doing just a m// match then you are certainly right. I was using s///, so I needed to match the whole string to replace it. blokhead	[reply]
Re: Re: Re: Re: My regex is too greedy! by Nkuvu (Priest) on Mar 05, 2003 at 19:50 UTC
Re: My regex is too greedy! by kelan (Deacon) on Mar 05, 2003 at 18:12 UTC
Why not just: `my $user = (split(/ +/, $message))[12];` [download] or something similar, depending on how you've done the previous splits. This should work as long as the message is in a standard format. kelan Perl6 Grammar Student	[reply] [d/l]