Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Greetings. I am a PERL programmer (there, I've said it) who can't fight his way out of a regular-expression-wet-paper-bag. It's not that I don't understand them, they are incredibly powerful and due to my time constraints seem to be hiding from me. So I thought I would ask here.

Currently, I write code that processes text messages sent from network devices (typically SNMP traps). Recently our dear friend Cisco upgraded their VPN concentrator IOS and "fixed" a bunch of messages they had previously left inconsistent (similar messages had slight syntactical variations).

My program originally parsed the message using split in this fashion:

($KEY, $DATE, $TIMESTAMP, $SEV, $LOG_NUM, $RPT, $IP_ADDR, $H1, $H2, + $USER, $H3, $H4, $H5, $H6, $H7, $H8, $H9, $H10, $H11, $H12, $H13, $D +URATION) = split(/ /, $VAR2);

Now I find myself stuck because my parsing routine isn't nearly smart enough to handle the new message formats. Basically, I am looking for a regular expression technique to pick out the elements I need from the message without having to split the string in pieces (because the number of pieces now fluctuates depending on the message). Here are two sample messages (an old one and a new one)

[2] private.enterprises.3076.2.1.4.4.15.22 (OctetString): 65479332 04/ +07/2004 04:15:21.980 SEV=4 AUTH/22 RPT=8622 User silk01 connected
Here is the new message, notice slight deviations
[2] private.enterprises.3076.2.1.4.4.15.22 (OctetString): 1064993 04/1 +2/2004 01:02:31.890 SEV=4 AUTH/22 RPT=362 User [schk01] Group [cisco3015] connected, Session Type: IPSec

My parsing isn't flexible enough to handle the shifting data and the code administration for my current routine is ugly. Can someone suggest an alternative using regular expressions that would parse out the fields I need. In the example above, I need the user ID, the group, the status of connected and the session type. I can detect the presence of a new version message vs. an old message so I can programmatically handle them differently - I don't need a regex that handles both.

Thanks in advance, I'm a little over my head here.

20040414 Edit by BazB: Changed title from 'Regular Expressions Hate Me'

Replies are listed 'Best First'.
Re: Regex for IOS messages
by kvale (Monsignor) on Apr 12, 2004 at 19:09 UTC
    How about
    my ($user, $group, $connected, $type) = $record =~ /^User \[(\w+)\] Group \[(\w+)\] (\w+), Session Type: +(\w+)$/;

    -Mark

      Mark, Thanks, I will give this a try. If it's a good valid example, I'll be able to run with it. Thanks for the speedy reply. Results soon.
      Okay, there is still a problem. There are two versions of the "new" message coming in, one of them works with the regex one of them does not:

      $VAR2 = 7965744 04/13/2004 08:55:23.410 SEV=4 AUTH/22 RPT=1617 User [arug01] Group [cisco3015] connected, Session Type: IPSec $VAR2 = 7973668 04/13/2004 08:58:22.630 SEV=4 AUTH/22 RPT=1618 User [192.128.133.41] Group [192.128.133.41] connected, Session Type: +IPSec/LAN-to-LAN
      Notice the lack of a space in front of User in the second example. Here is the regex I am using currently, can you suggest modifications. (I am taking these apart to learn them better as I get responses so I hope not to ask same stupid question twice.
Re: Regex for IOS messages
by davido (Cardinal) on Apr 13, 2004 at 06:41 UTC
    This followup isn't in direct answer to your question. But I wanted to respond to the issue expressed in the title of the node: "Regular Expressions Hate Me".

    First, don't take it personally. ;) They don't actually hate you, they just haven't taken the time to get to know you well enough yet.

    Joking aside, if you (like most people, myself included) occasinally feel inadequate in the face of serious regex challenges, you owe it to yourself to pick up at the store or library the "Owls book", Mastering Regular Expressions from O'Reilly and Associates.

    There are some good reviews of this book in the Monastery's Reviews section. By the time you finish this book you'll find yourself dreaming in pattern matches.


    Dave

      No, they really do hate me. (You should see the hurtful spam they send) Just kidding. I will check out the book although if you are like me, you probably don't have time to read the book before you finish the project. Until then I'll have to beg for help here. - Scott
        Sorry, still learning how to use this board. Replied to the wrong message and left out the regex. Hope you find it anyway.
        my ($user, $group, $connected, $type) = $VAR2 =~ /User \[(\w+)\] Group + \[(\w+)\] (\w+), Session Type: +(\w+)$/;
Re: Regex for IOS messages
by Anonymous Monk on Apr 12, 2004 at 20:17 UTC
    Mark, I tried your example without success. Here is what the output from my program shows:

    >>> | | | $VAR2 = 4980519 04/12/2004 21:07:37.440 SEV=4 AUTH/22 RPT=1151 User [agra02] Group [cisco3015] connected, Session Type: IPSec
    As you can see in the first line the ">>>" indicates where I should have printed the contents of the extracted values. The second line, shows $VAR2 contents. One problem might be the line break occuring before the word "User". Here is the actual text message:
    1081800476 3 Mon Apr 12 15:07:56 2004 test-vpn.mydomain.com u Trap: g +eneric 6 specific 0 args (3): [1] mgmt.mib-2.system.sysUpTime.0 (Tic +ks): 10825222 1081800476 3 Mon Apr 12 15:07:56 2004 test-vpn.mydomain.com u [2] p +rivate.enterprises.3076.2.1.4.4.15.22 (OctetString): 4980519 04/12/20 +04 21:07:37.440 SEV=4 AUTH/22 RPT=1151 1081800476 3 Mon Apr 12 15:07:56 2004 test-vpn.mydomain.com u User +[agra02] Group [cisco3015] connected, Session Type: IPSec 1081800476 3 Mon Apr 12 15:07:56 2004 test-vpn.mydomain.com u [3] p +rivate.enterprises.3076.2.1.2.4.1.1 (OctetString): AUTH/22
    Here is how I coded it from your example:
    if ($ARGUMENTS==6) { ($KEY, $DATE, $TIMESTAMP, $SEV, $LOG_NUM, $RPT, $H1, $H2, $USER, $H +3, $GROUP, $H4, $H5, $H6, $TYPE) = split(/ /, $VAR2); $USER =~ s/\[//; $USER =~ s/\]//; $GROUP =~ s/\[//; $GROUP =~ s/\]//; # Old code handling ends here. This is the new piece you suggested. my ($user, $group, $connected, $type) = $VAR2 =~ /^User \[(\w+)\] Grou +p \[(\w+)\] (\w+), Session Type: +(\w+)$/; print TRAPDATA "\n>>> $user | $group | $connected | $type\n"; print TRAPDATA "\$VAR2 = $VAR2\n"; }
      The problem is that 'User...' is not on a line by itself, but is part of a larger record, all on one line. In that case, you don't want the beginning-of-line anchor ^. Just allow it to match anywhere on the line, i.e. let it float:
      my ($user, $group, $connected, $type) = $VAR2 =~ /User \[(\w+)\] Group \[(\w+)\] (\w+), Session Type: +(\w+) +/;
      If you would like to learn more about the regex features I am using in this exmaple, check out the tutorial perlrequick.

      -Mark

        Mark, thanks, that did the trick. You are my new best friend.