in reply to adaptive syslog message parsing

Here's a start:

use strict; use warnings; my $digest = bless {root => {}, maxLevel => 3}; $digest->add ($_) while <DATA>; $digest->mergeTails (); $digest->print (); sub add { my ($self, $line, $level, $context) = @_; $level ||= 1; $context ||= $self->{root}; if ($level == $self->{maxLevel} or $line !~ s/(\S*?)\s*\W\s+//) { push @{$context->{tails}}, $line; return; } my $prefix = $1; $context->{$prefix} ||= {}; $context = $context->{$prefix}; $self->add ($line, 1 + $level, $context); } sub mergeTails { my ($self, $context) = @_; $context ||= $self->{root}; unless (exists $context->{tails}) { $self->mergeTails ($context->{$_}) for keys %$context; return; } my @tails = sort {length $a <=> length $b} @{$context->{tails}}; my @groups; push @{$groups[length $_]}, $_ for @tails; @groups = grep {defined $_} @groups; for my $group (@groups) { my $mask = pop @$group; my $count = 1; while (@$group) { my $str = pop @$group; my $mix = $mask ^ $str; my $cpl = "\xff" x length $mix; $mix =~ tr/\0/\xff/c; $mix = $mix ^ $cpl; $mask = $mask & $mix; ++$count; } $mask =~ tr/\0/*/; push @{$context->{digest}}, [$mask, $count]; } } sub print { my ($self, $context, $indent) = @_; $context ||= $self->{root}; $indent ||= ''; if (exists $context->{digest}) { print "$indent($_->[1]) $_->[0]" for @{$context->{digest}}; return; } for (sort keys %$context) { print "$indent$_\n"; $self->print ($context->{$_}, $indent . ' '); } } __DATA__
mail2-out - ntpd: sendto(192.168.4.10): Bad file descriptor mail2-out - postfix/smtp: warning: valid_hostname: empty hostname mail2-out - postfix/smtp: warning: malformed domain name in resource d +ata of MX record for hotmil.com: mail2-out - ntpd: sendto(192.168.4.20): Bad file descriptor mail2-out - ntpd: sendto(192.168.4.10): Bad file descriptor mail2-out - postfix/smtp[32282]: warning: numeric domain name in resou +rce data of MX record for uyahoo.com: 10.0.0.2 mail2-out - ntpd: sendto(192.168.4.20): Bad file descriptor mail2-out - ntpd: sendto(192.168.4.10): Bad file descriptor infocache02 - ldap_cachemgr: libsldap: Status: 91 Mesg: openConnectio +n: simple bind failed - Can't connect to the LDAP server infocache02 - ldap_cachemgr: Error: Unable to refresh from profile:tls +_automount_profile. (error=1) infocache02 - sendmail: l560aB7V017120: Losing ./qfl560aB7V017120: sav +email panic infocache02 - sendmail: l560aB7V017120: SYSERR(root): savemail: cannot + save rejected email anywhere infocache02 - sendmail: l1FM2rFa026352: Losing ./qfl1FM2rFa026352: sav +email panic infocache02 - sendmail: l1FM2rFa026352: SYSERR(root): savemail: cannot + save rejected email anywhere infocache02 - sendmail: l1FI2rFa022597: Losing ./qfl1FI2rFa022597: sav +email panic mail2-in - postfix/smtpd: warning: 190.55.102.166: hostname cpe-190-55 +-102-166.telecentro.com.ar verification failed: hostname nor servname + provided, or not known mail2-in - postfix/smtpd: warning: 201.29.80.154: hostname 20129080154 +.user.veloxzone.com.br verification failed: hostname nor servname pro +vided, or not known mail2-in - postfix/smtpd: warning: 84.9.96.201: address not listed for + hostname mail.intechcentre.com mail2-in - postfix/smtpd: warning: 84.9.96.201: address not listed for + hostname mail.intechcentre.com mail2-in - postfix/smtpd: warning: 190.8.87.73: hostname din-190-8-87- +73.manquehue.net verification failed: hostname nor servname provided, + or not known mail2-in - postfix/smtpd: warning: 190.8.87.73: hostname din-190-8-87- +73.manquehue.net verification failed: hostname nor servname provided, + or not known

Prints:

infocache02 ldap_cachemgr (1) Error: Unable to refresh from profile:tls_automount_profil +e. (error=1) (1) libsldap: Status: 91 Mesg: openConnection: simple bind fa +iled - Can't connect to the LDAP server sendmail (3) l*******0*****: Losing ./qfl*******0*****: savemail panic (2) l*******0*****: SYSERR(root): savemail: cannot save reject +ed email anywhere mail2-in postfix/smtpd (2) warning: 84.9.96.201: address not listed for hostname mail +.intechcentre.com (2) warning: 190.8.87.73: hostname din-190-8-87-73.manquehue.n +et verification failed: hostname nor servname provided, or not known (1) warning: 201.29.80.154: hostname 20129080154.user.veloxzon +e.com.br verification failed: hostname nor servname provided, or not +known (1) warning: 190.55.102.166: hostname cpe-190-55-102-166.telec +entro.com.ar verification failed: hostname nor servname provided, or +not known mail2-out ntpd (5) sendto(192.168.4.*0): Bad file descriptor postfix/smtp (1) warning: valid_hostname: empty hostname (1) warning: malformed domain name in resource data of MX reco +rd for hotmil.com: postfix/smtp[32282] (1) warning: numeric domain name in resource data of MX record + for uyahoo.com: 10.0.0.2

which doesn't quite digest the tails as you would like, but you could get a lot closer by dealing with matching runs of 'words' (/(\S+)/) rather than runs of characters. Algorithm::Diff would facilitate matching runs (left as an exercise for the reader).


DWIM is Perl's answer to Gödel

Replies are listed 'Best First'.
Re^2: adaptive syslog message parsing
by GrandFather (Saint) on Jun 07, 2007 at 01:31 UTC

    Ok, I couldn't resist!

    add:

    use Algorithm::Diff;

    toward the start. In sub add change:

    push @{$context->{tails}}, $line;

    to:

    push @{$context->{tails}}, [$line =~ /(\S+)/g];

    In sub mergeTails replace everything after:

    my @groups;

    with:

    push @{$groups[@$_]}, $_ for @tails; @groups = grep {defined $_} @groups; for my $group (@groups) { my @ref = @{$group->[-1]}; my @org = @ref; my $count = 1; pop @$group; while (@$group) { my @new = @{pop @$group}; my @diffs = Algorithm::Diff::diff (\@ref, \@new); for my $change (@diffs) { next unless $change->[0][0] eq '-'; $ref[$change->[0][1]] = undef; } ++$count; } for (0 .. $#ref) { next if defined $ref[$_]; $org[$_] = '*****'; } push @{$context->{digest}}, [join (' ', @org), $count]; }

    Now prints:

    infocache02 ldap_cachemgr (1) Error: Unable to refresh from profile:tls_automount_profil +e. (error=1) (1) libsldap: Status: 91 Mesg: openConnection: simple bind fai +led - Can't connect to the LDAP server sendmail (3) ***** Losing ***** savemail panic (2) ***** SYSERR(root): savemail: cannot save rejected email a +nywhere mail2-in postfix/smtpd (2) warning: 84.9.96.201: address not listed for hostname mail +.intechcentre.com (4) warning: ***** hostname ***** verification failed: hostnam +e nor servname provided, or not known mail2-out ntpd (5) ***** Bad file descriptor postfix/smtp (1) warning: valid_hostname: empty hostname (1) warning: malformed domain name in resource data of MX reco +rd for hotmil.com: postfix/smtp[32282] (1) warning: numeric domain name in resource data of MX record + for uyahoo.com: 10.0.0.2

    DWIM is Perl's answer to Gödel
      i admit, i lol'd when i read 'i couldn't resist..' i couldn't duplicate the output with the sample data using algorithm diff, it was similar but the new lines were off.. additionally, i have a more complete set of data that it doesn't output anything but one line (with the number 6 in parenthesis).. it looks pretty promising on the short set of sample data but i think it's confused with the big set of data (which happens to use fqdn instead of just hostname)

        I've performed a little data cleansing before adding lines - omitting empty lines seems to be the main fix! I also changed from using undef to '' in the diff code (Algorithm::Diff seemed unhappy with undefs) and tidied up the output a little.

        Given the large data set prints in part:

        ... mail1-out.nyc.domain.com ntpd (169) ***** Bad file descriptor postfix/smtp (2) warning: malformed domain name in resource data of MX +record for ***** (32) warning: no MX host for ***** has a valid address rec +ord (18) warning: numeric domain name in resource data of MX r +ecord for ***** 127.0.1.50 (2) warning: valid_hostname: empty hostname postfix/smtpd (7) warning: Illegal address syntax from ***** in RCPT com +mand: <jane@lulu.co $> sm-mta (2) ***** SYSERR(root): ***** config error: mail loops bac +k to me (MX problem?) syslog-ng (1) Changing permissions on special file /dev/console ... mail2-out.nyc.domain.com ntpd (168) ***** Bad file descriptor postfix/smtp (2) warning: malformed domain name in resource data of MX +record for ***** (25) warning: numeric domain name in resource data of MX r +ecord for ***** 10.0.0.2 (2) warning: valid_hostname: empty hostname sm-mta (1) l55DmFcQ022740: SYSERR(root): localhost.fabulous.com. +config error: mail loops back to me (MX problem?) syslog-ng (1) Changing permissions on special file /dev/console mail2-out.sfc.domain.com postfix/smtp (61) warning: malformed domain name in resource data of MX + record for ***** (1) warning: no MX host for epm.net has a valid address re +cord (61) warning: valid_hostname: empty hostname

        DWIM is Perl's answer to Gödel