Here's a start:

use strict; use warnings; my $digest = bless {root => {}, maxLevel => 3}; $digest->add ($_) while <DATA>; $digest->mergeTails (); $digest->print (); sub add { my ($self, $line, $level, $context) = @_; $level ||= 1; $context ||= $self->{root}; if ($level == $self->{maxLevel} or $line !~ s/(\S*?)\s*\W\s+//) { push @{$context->{tails}}, $line; return; } my $prefix = $1; $context->{$prefix} ||= {}; $context = $context->{$prefix}; $self->add ($line, 1 + $level, $context); } sub mergeTails { my ($self, $context) = @_; $context ||= $self->{root}; unless (exists $context->{tails}) { $self->mergeTails ($context->{$_}) for keys %$context; return; } my @tails = sort {length $a <=> length $b} @{$context->{tails}}; my @groups; push @{$groups[length $_]}, $_ for @tails; @groups = grep {defined $_} @groups; for my $group (@groups) { my $mask = pop @$group; my $count = 1; while (@$group) { my $str = pop @$group; my $mix = $mask ^ $str; my $cpl = "\xff" x length $mix; $mix =~ tr/\0/\xff/c; $mix = $mix ^ $cpl; $mask = $mask & $mix; ++$count; } $mask =~ tr/\0/*/; push @{$context->{digest}}, [$mask, $count]; } } sub print { my ($self, $context, $indent) = @_; $context ||= $self->{root}; $indent ||= ''; if (exists $context->{digest}) { print "$indent($_->[1]) $_->[0]" for @{$context->{digest}}; return; } for (sort keys %$context) { print "$indent$_\n"; $self->print ($context->{$_}, $indent . ' '); } } __DATA__
mail2-out - ntpd: sendto(192.168.4.10): Bad file descriptor mail2-out - postfix/smtp: warning: valid_hostname: empty hostname mail2-out - postfix/smtp: warning: malformed domain name in resource d +ata of MX record for hotmil.com: mail2-out - ntpd: sendto(192.168.4.20): Bad file descriptor mail2-out - ntpd: sendto(192.168.4.10): Bad file descriptor mail2-out - postfix/smtp[32282]: warning: numeric domain name in resou +rce data of MX record for uyahoo.com: 10.0.0.2 mail2-out - ntpd: sendto(192.168.4.20): Bad file descriptor mail2-out - ntpd: sendto(192.168.4.10): Bad file descriptor infocache02 - ldap_cachemgr: libsldap: Status: 91 Mesg: openConnectio +n: simple bind failed - Can't connect to the LDAP server infocache02 - ldap_cachemgr: Error: Unable to refresh from profile:tls +_automount_profile. (error=1) infocache02 - sendmail: l560aB7V017120: Losing ./qfl560aB7V017120: sav +email panic infocache02 - sendmail: l560aB7V017120: SYSERR(root): savemail: cannot + save rejected email anywhere infocache02 - sendmail: l1FM2rFa026352: Losing ./qfl1FM2rFa026352: sav +email panic infocache02 - sendmail: l1FM2rFa026352: SYSERR(root): savemail: cannot + save rejected email anywhere infocache02 - sendmail: l1FI2rFa022597: Losing ./qfl1FI2rFa022597: sav +email panic mail2-in - postfix/smtpd: warning: 190.55.102.166: hostname cpe-190-55 +-102-166.telecentro.com.ar verification failed: hostname nor servname + provided, or not known mail2-in - postfix/smtpd: warning: 201.29.80.154: hostname 20129080154 +.user.veloxzone.com.br verification failed: hostname nor servname pro +vided, or not known mail2-in - postfix/smtpd: warning: 84.9.96.201: address not listed for + hostname mail.intechcentre.com mail2-in - postfix/smtpd: warning: 84.9.96.201: address not listed for + hostname mail.intechcentre.com mail2-in - postfix/smtpd: warning: 190.8.87.73: hostname din-190-8-87- +73.manquehue.net verification failed: hostname nor servname provided, + or not known mail2-in - postfix/smtpd: warning: 190.8.87.73: hostname din-190-8-87- +73.manquehue.net verification failed: hostname nor servname provided, + or not known

Prints:

infocache02 ldap_cachemgr (1) Error: Unable to refresh from profile:tls_automount_profil +e. (error=1) (1) libsldap: Status: 91 Mesg: openConnection: simple bind fa +iled - Can't connect to the LDAP server sendmail (3) l*******0*****: Losing ./qfl*******0*****: savemail panic (2) l*******0*****: SYSERR(root): savemail: cannot save reject +ed email anywhere mail2-in postfix/smtpd (2) warning: 84.9.96.201: address not listed for hostname mail +.intechcentre.com (2) warning: 190.8.87.73: hostname din-190-8-87-73.manquehue.n +et verification failed: hostname nor servname provided, or not known (1) warning: 201.29.80.154: hostname 20129080154.user.veloxzon +e.com.br verification failed: hostname nor servname provided, or not +known (1) warning: 190.55.102.166: hostname cpe-190-55-102-166.telec +entro.com.ar verification failed: hostname nor servname provided, or +not known mail2-out ntpd (5) sendto(192.168.4.*0): Bad file descriptor postfix/smtp (1) warning: valid_hostname: empty hostname (1) warning: malformed domain name in resource data of MX reco +rd for hotmil.com: postfix/smtp[32282] (1) warning: numeric domain name in resource data of MX record + for uyahoo.com: 10.0.0.2

which doesn't quite digest the tails as you would like, but you could get a lot closer by dealing with matching runs of 'words' (/(\S+)/) rather than runs of characters. Algorithm::Diff would facilitate matching runs (left as an exercise for the reader).


DWIM is Perl's answer to Gödel

In reply to Re: adaptive syslog message parsing by GrandFather
in thread adaptive syslog message parsing by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.