blue_cowdawg has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monks,
Consider the following log file entry:

Aug 22 11:46:27 masterudp003210uds.netops.msnyuhealth.org 148526: Aug +22 15:46:26 UTC: %SEC-6-IPACCESSLOGDP: list 101 denied icmp 10.4.12.2 +53 -> 10.7.151.48 :8/0:, 1 packet
Following the advice of my fellow monks I broke up my regexes using qr and came up with this list of regexes:
my $dtg=qr@[A-Z][a-z]+\s\d+\s\d+:\d+:\d+@; my $thingy=qr([\.\d]+); my $tz=qr([A-Z]{3}); my $ipaddr=qr@\d+\.\d+\.\d+\.\d+@; my $fqdn=qr@[a-zA-Z\-\.]+@; my $timezone = qr@[A-Z]+@; my $fragger = qr@(\%SEC-6-IPACCESSLOGP|\%SEC-6-IPACCESSLOGDP)@; my $list=qr@list\s(\d+)@; my $protocol = qr@(tcp|udp|icmp)@; my $ip_with_port=qr@($ipaddr):(\d+):@; my $arrow = qr @\-\>@; my $time_lapse=qr@\d+d\d+h@; # Something like 6d45h <sigh!> my $metric_ex=qr(\d+:); my $dtg1="Aug 22 11:46:27"; my $month="Aug"; my $monthDay="Aug 22"; my $time="11:46:27"; my $matchMonth=qr([A-Z][a-z]+); my $dateMatch=qr($matchMonth\s\d+); my $matchTime=qr(\d+:\d+:\d+);
No doubt there are useless lines up there as I cut and pasted this from some unit testing that I was doing. So far so good.. right? well... here is a test sniglet that I wrote:
printf "Full String: %s\n",( $full_string =~ m@ ($dateMatch\s$matchTime)\s ($fqdn|$ipaddr)\s $metric_ex\s ($dateMatch\s$matchTime)\s ($timezone):\s $fragger:\s list\s(\d+)\s denied\s($protocol)\s ($ipaddr)\s $arrow\s ($ipaddr)\s :\d+\/\d+\:,\s (\d+)\spacket @x ? "OK":"FAILED" );
It fails to match... Anyone have an idea why?

A very tired cattle dog style Perl Monk going to bed now.


Peter @ Berghold . Net

Sieze the cow! Bite the day!

Nobody expects the Perl inquisition!

Test the code? We don't need to test no stinkin' code!
All code posted here is as is where is unless otherwise stated.

Brewer of Belgian style Ales

Replies are listed 'Best First'.
Re: CISCO Log file pattern matching (again!)
by Kanji (Parson) on Aug 25, 2003 at 03:38 UTC

    You neglected to add digits to $fqdn, which your hostname contains.

    Instead you should use something like [A-Za-z0-9\-\.]+, but if if still find your regexp fails, a basic diagnosis would be to remove components from the regexp until it does work, and then work out why that last one you removed failed to match...

        --k.


            You neglected to add digits to $fqdn, which your hostname contains.
      ARRRGH! That explains why the regex worked in my test code but not in the live code.

      Note to self: cut and paste is a good thing... transposing by hand can be a bad thing...

      Thanks for spotting that!


      Peter @ Berghold . Net

      Sieze the cow! Bite the day!

      Nobody expects the Perl inquisition!

      Test the code? We don't need to test no stinkin' code!
      All code posted here is as is where is unless otherwise stated.

      Brewer of Belgian style Ales

Re: CISCO Log file pattern matching (again!)
by bobn (Chaplain) on Aug 25, 2003 at 05:01 UTC

    I've only done limited Cisco log parsing, but since the format is very cnsistent up until the %Whatever, why not start with a split(' ') and then parse the pieces, ala:

    # untested while (<>) { my ($mon, $day, $time, $rtr, $seqno, $rmon, $rday, $rtime, $rtz, $c +ode, $msg) = split(' ', $_, 11); next unless $code =~ m!\%SEC-6-IPACCESSLOGP:|\%SEC-6-IPACCESSLOGDP: +!: # process stuff here, ala: my (undef, $listno, $act, $proto, $src, undef, $trget, $other, $cnt +, ) = split(' ', $msg); }
    Now the parts that are consistent are split out. You'll still have to deal with the message-type dependent stuff - which should be in $msg - there's no help for that, but now you can attack the individual pieces wihout hurting your brain so badly.

    And if all the messages that you're interested in are ACL violations, you can let split break all the pieces out for you.

    (And anyhow, aren't you really looking for "-> IP_addr(69)", at least this week? I know I was on Friday - ALL day.)

    --Bob Niederman, http://bob-n.com

    All code given here is UNTESTED unless otherwise stated.

Re: CISCO Log file pattern matching (again!)
by zengargoyle (Deacon) on Aug 25, 2003 at 09:02 UTC

    i would do it like this...

    while (<DATA>) { next unless index '%SEC-6-IPACCESSLOG', $_; @f = split /(?:[:,]*\s+:?|\/)/; print join($/, @f), $/; } __DATA__ Aug 22 11:46:27 masterudp003210uds.netops.msnyuhealth.org 148526: Aug +22 15:46:26 UTC: %SEC-6-IPACCESSLOGDP: list 101 denied icmp 10.4.12.2 + 53 -> 10.7.151.48 :8/0:, 1 packet
    
    $ perl log.pl 
    Aug
    22
    11:46:27
    masterudp003210uds.netops.msnyuhealth.org
    148526
    Aug
    22
    15:46:26
    UTC
    %SEC-6-IPACCESSLOGDP
    list
    101
    denied
    icmp
    10.4.12.2
    53
    ->
    10.7.151.48
    8
    0
    1
    packet
    

    inserting various checks on the fields after splice or shifting off of the front (probably a switch of some sort on the %PROCESS doing the logging)

      The main reason I do not use split to work these log lines is the fact that I am actually doing stuff based on the matching that I am doing. As well I am hopeing to use the date stamps.

      Overall what I am developing is a script to search for hosts around three campuses (20,000 hosts+) that may or may not be infected with the latest round of viruses based on traffic patterns.


      Peter @ Berghold . Net

      Sieze the cow! Bite the day!

      Nobody expects the Perl inquisition!

      Test the code? We don't need to test no stinkin' code!
      All code posted here is as is where is unless otherwise stated.

      Brewer of Belgian style Ales

        using next/split on a mere 12 hours of logs => 2m12s. using a regex to attempt to match lines => 2m43s.

        not a lot, but it grows the bigger the files get.

        once the line is split...

        my @dateinfo = splice @f, 0, 3; ...

        trust me, it will be faster to split. especially if your logs get large.

Re: CISCO Log file pattern matching (again!)
by mr_stru (Sexton) on Aug 25, 2003 at 03:39 UTC

    There's two things.

    The first is that you don't allow numbers in the domain name and in the example there are numbers.

    The second is that the 53 before the arrow isn't matched anywhere.

    Struan

      The reformatting that splits long lines caught you:
      +   .   .   .   .   .   list 101 denied icmp 10.4.12.2
      +53 -> 10.7.151.48 :8/0:, 1 packet

      The '53' was actually a continuation of "10.4.12.253".   (Well, I'm pretty sure anyway ... it's 4am here ... ;-)