Dru has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I'm in a bit of a bind here. I think I'm in over my head. I have large log files (2-3 GB) that have entries like this:
2003-12-22 15:48:35 Local4.Error 192.168.1.2 Dec 22 2003 20:48:25: %FW-3-106011: Deny inbound (No xlate) udp 4 dst inside:192.168.18.6/161 2003-12-22 23:52:00 Local4.Critical 192.168.1.2 Dec 23 2003 04:51:50: %FW-2-108002: SMTP replaced >: out 192.168.36.223 in 192.168.11.12 data: MAIL From: <123@hotmail.com>..
What I would like to do is get out the unique FW messages (ie %FW-3-106011 and %FW-2-108002) and when I print it out include just one of the log entries with it. For example I would like it to look like this:
%FW-3-106011: 2003-12-22 15:48:35 Local4.Error 192.168.1.2 Dec 22 2003 20:48:25: %FW-3-106011: Deny inbound (No xlate) udp 4 dst inside:192.168.18.6/161
I know this isn't correct, but I gave it a stab:
use strict; use warnings; my $file = 'd:\PROGRA~1\Syslogd\Logs\syslog22Dec2003.txt'; open (FILE, $file) or die "Can't open $file: $!\n"; my (@lines); while (<FILE>){ push (@lines, $_) if /(\%FW\-\d-\d+)/; next unless $1 !~ /$1/; #my stupid logic }
I appreciate any help.

Thanks,
Dru

Replies are listed 'Best First'.
Re: Tricky Syslog Parsing
by Abigail-II (Bishop) on Jan 13, 2004 at 16:38 UTC
    I'd use something like this (untested):
    $/ = ""; # Paragraph node. my %seen; open my $fh => "...." or die; while (<$fh>) { next unless /(%FW-\d+-\d+)/; print "$1\n$_" unless $seen {$1} ++; }

    This is just a variant on "remove duplicates".

    Abigail

Re: Tricky Syslog Parsing
by Old_Gray_Bear (Bishop) on Jan 13, 2004 at 16:40 UTC
    If you want the "unique somethings", use a hash. Here you key off of the message ID and store the line of text as the value.

    Code, untested --

    my %messages; while(<FILE>) { /(FW\-\d+\-\d+)/; # extract MSG ID next if exists($messages{$1}; # ignore it if we've $messages{$1} = $1; # alreaqdy seen it. }

    %messages has the first occurance of each message

    ----
    I Go Back to Sleep, Now.

    OGB

Re: Tricky Syslog Parsing
by blue_cowdawg (Monsignor) on Jan 13, 2004 at 16:56 UTC

    Having been faced with solving a very similar problem not that long ago let me pass along one lesson that I learned: Create regexs using qq() and test each one one at a time.

    Caveat:All the following code has not been tested

    Looking at the examples you ahve provided here are a few thoughts.

    my $dtg=qq@\d+\-\d+\-\d+\s\d+:\d+:\d+@; # Date time group my $logtype=qw@Local\d\.[Error|Critical]@; # Log type my $ipaddr=qw@\d+\.\d+\.\d+\.\d+@; # IP Address my $odtg=qq@[A-Za-z]{3}\s\d+\s\d+\s\d+:\d+:\d+:@; my $select=qq@%FW\-\d+\-\d+@; # FW or PIX? my $match_line=qq@$dtg\s+$logtype\s+$ipaddr\s+$otg\s+$select@;
    Now you can take each of the regexs that make up the big regex and test them one at a time and see if they work.

    Two other comments:

    1. Are you trying to match %PIX or %FW?
    2. What does next unless $1 !~ /$1/; mean in the logic of your code?


    Peter L. Berghold -- Unix Professional
    Peter at Berghold dot Net
       Dog trainer, dog agility exhibitor, brewer of fine Belgian style ales. Happiness is a warm, tired, contented dog curled up at your side and a good Belgian ale in your chalice.
      Much appreciated Monks. You still haven't let me down yet.

      This might not be the best method/code, but here's how I got it working. Any improvements welcome (I don't know why I have to use open twice, if I never close FILE, but it doesn't work otherwise):
      my $file = 'd:\PROGRA~1\Syslogd\Logs\syslog22Dec2003.txt'; open (FILE, $file) or die "Can't open $file: $!\n"; my %messages; while(<FILE>) { /(FW\-\d+\-\d+)/; # extract MSG ID next if exists($messages{$1}); # ignore it if we've $messages{$1} = $1; # already seen it. } my $count; open (FILE, $file) or die "Can't open $file: $!\n"; for my $i (keys %messages){ print "\n$i:\n\n"; $count = 0; while(<FILE>){ if (/$i/){ print "$_\n"; $count++; } last if $count == 1; } } close FILE;
        Your first loop will scan through the entire file. You start from the top again the second time you open it.

        If your logfile consistently contains three line entries, you could loop on the date regex and then readline() in the next two lines, parse all three at the same time, then loop back up again. The pointer will follow your place in the file and you'll be right at the next timestamp entry.

        open file; while(<file>) { my $line1 = $_; my $line2 = readline(<file>); my $line3 = readline(<file>); #handle $line1, $line2, $line3 all at once then #loop up for the next entry }
Re: Tricky Syslog Parsing
by Art_XIV (Hermit) on Jan 13, 2004 at 19:21 UTC

    This handles it on an entry-by-entry basis:

    use warnings; use strict; use Data::Dumper; my %fws = (); my @queue = (); while (<DATA>) { my $line = $_; if ($line =~ /^\d{4}-\d\d-\d\d/) { process_queue(@queue); @queue = (); } push @queue, $line; } process_queue(@queue) if scalar(@queue) > 0; sub process_queue { my @entries = @_; foreach my $entry (@entries) { #the regex below should be modified #to suit your actual needs if ($entry =~ /(%FW-\d-\d{6})/) { my $fw = $1; $fws{$fw} = join('', @queue) unless exists $fws{$fw}; } } } print Dumper(%fws); 1; __DATA__ 2003-12-22 15:48:35 Local4.Error 192.168.1.2 Dec 22 2003 20:48:25: %FW-3-106011: Deny inbound (No xlate) udp 4 dst inside:192.168.18.6/161 2003-12-22 23:52:00 Local4.Critical 192.168.1.2 Dec 23 2003 04:51:50: %FW-2-108002: SMTP replaced >: out 192.168.36.223 in 192.168.11.12 data: MAIL From: <123@hotmail.com>.. 2003-12-22 23:56:00 Local4.Error 192.168.1.2 Dec 22 2003 20:48:25: %FW-3-106011: Insert Worm data: MAIL FROM <foo@bar.com> 2003-12-22 23:57:22 Local4.Oops 192.168.1.2 Dec 22 2003 20:48:25: %FW-3-106011: Deny Involvement (No xlate) udp 4 dst inside:192.168.18.6/161 2003-12-22 23:58:33 Local4.Critical 192.168.1.2 Dec 23 2003 04:51:50: %FW-2-108002: SMTP blow'd up out 192.168.36.223 in 192.168.11.12 data: MAIL From: <spammy.spammington@spam.net>..
    Hanlon's Razor - "Never attribute to malice that which can be adequately explained by stupidity"
Re: Tricky Syslog Parsing
by elwarren (Priest) on Jan 13, 2004 at 21:50 UTC
    I've got a personal project on the back burner to parse the logfile emails I receive from my d-link router at home. In my notes I have earmarked a few modules to help me write this when I get around to it. You may find these modules helpful:

    SyslogScan contains routines to parse system logs. The package includes a sample application, read_mail_log.pl, which can print out various statistics about mail sent and received.

    Parse::Syslog - Parse Unix syslog files

    DBD::File - Base class for writing DBI drivers for plain files

    AnyData::Format::Text - I thought there was an AnyData::Format::Syslog module but I was wrong...

    HTH