i have a syslog server, which gets almost 80 msgs per second from all our sytems, so i'm trying to implement a solution for making the messages easier to digest, since it sends an email daily with all the days messages, and it's now grown to several megs.. who wants to read every line?

my plan was to have my perl script generate an html page of collapsable lists.. something like

- host
  - process
    - error type (count)

but since the messages come from a variety of sources/daemons, in a variety of formats, am kind of stumped on how to approach parsing it..

if the format was always the same, i'd have no problem.. even if there were only a certain number of formats, i could time consumingly come up with regexes to parse them all, but this is not practical given the exact format is no known..

so the part of writing the script i'm stuck on is implementing an adaptive parsing algorithm, to build a nested hashref "tree"..

i even thought about using a complex set of substr's and indexof's calls , but my head hurts after a while trying to figure out how to make it adaptive.

here's a sample of syslog messages..
mail2-out - ntpd: sendto(192.168.4.10): Bad file descriptor mail2-out - postfix/smtp: warning: valid_hostname: empty hostname mail2-out - postfix/smtp: warning: malformed domain name in resource d +ata of MX record for hotmil.com: mail2-out - ntpd: sendto(192.168.4.20): Bad file descriptor mail2-out - ntpd: sendto(192.168.4.10): Bad file descriptor mail2-out - postfix/smtp[32282]: warning: numeric domain name in resou +rce data of MX record for uyahoo.com: 10.0.0.2 mail2-out - ntpd: sendto(192.168.4.20): Bad file descriptor mail2-out - ntpd: sendto(192.168.4.10): Bad file descriptor infocache02 - ldap_cachemgr: libsldap: Status: 91 Mesg: openConnectio +n: simple bind failed - Can't connect to the LDAP server infocache02 - ldap_cachemgr: Error: Unable to refresh from profile:tls +_automount_profile. (error=1) infocache02 - sendmail: l560aB7V017120: Losing ./qfl560aB7V017120: sav +email panic infocache02 - sendmail: l560aB7V017120: SYSERR(root): savemail: cannot + save rejected email anywhere infocache02 - sendmail: l1FM2rFa026352: Losing ./qfl1FM2rFa026352: sav +email panic infocache02 - sendmail: l1FM2rFa026352: SYSERR(root): savemail: cannot + save rejected email anywhere infocache02 - sendmail: l1FI2rFa022597: Losing ./qfl1FI2rFa022597: sav +email panic mail2-in - postfix/smtpd: warning: 190.55.102.166: hostname cpe-190-55 +-102-166.telecentro.com.ar verification failed: hostname nor servname + provided, or not known mail2-in - postfix/smtpd: warning: 201.29.80.154: hostname 20129080154 +.user.veloxzone.com.br verification failed: hostname nor servname pro +vided, or not known mail2-in - postfix/smtpd: warning: 84.9.96.201: address not listed for + hostname mail.intechcentre.com mail2-in - postfix/smtpd: warning: 84.9.96.201: address not listed for + hostname mail.intechcentre.com mail2-in - postfix/smtpd: warning: 190.8.87.73: hostname din-190-8-87- +73.manquehue.net verification failed: hostname nor servname provided, + or not known mail2-in - postfix/smtpd: warning: 190.8.87.73: hostname din-190-8-87- +73.manquehue.net verification failed: hostname nor servname provided, + or not known
and here is how i would want it parsed (imagine Data::Dumper except with indentations).. once parsed, i could figure out how to make output like this:

infocache02
	ldap_cachemgr
		(1) libsldap: Status: 91  Mesg: openConnection: simple bind failed - Can't connect to the LDAP server
		(1) Error: Unable to refresh from profile:tls_automount_profile. (error=1)
	sendmail
		(3) **************: Losing ************** savemail panic
		(2) SYSERR(root): savemail: cannot save rejected email anywhere
mail2-out
	ntpd
		(5) sendto(************): Bad file descriptor
	postfix/smtp
		(1) warning: valid_hostname: empty hostname
		(1) warning: malformed domain name in resource data of MX record for ********:
		(1) warning: numeric domain name in resource data of MX record for *******: ********
mail2-in
	postfix/smtpd
		(4) warning: ********: hostname ********** verification failed: hostname nor servname provided, or not known
		(2) warning: ********: address not listed for hostname **********
basically what i could use a hand with is figuring out how to have the code determine which part of each line is variable (with multiple variations possible per line).. it would rely on analyzing all the other available messages and keeping track of which part of the line varied (with respect to similar lines.. i seem to remember perl has fuzzy regex matching, but i don't know what kind of regex would be fuzzy enough heh.. i've been writing perl code for a few years now, but this kind of problem i have never encountered before and would appreciate any help and pointers in the right direction..

In reply to adaptive syslog message parsing by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.