This is about as far as I think I would go. The remaining duplicates are where the user appears to supply random junk in place of commands or addresses. You could certainly add a few more special cases if they are frequent.
The modified code:
while( <> ) { next if /^\s*$/; ## Skip blank lines my( $src, $mode, $rest ) = m' ( ^ \S+ ) \s+ - \s+ ( [^\[:]+ ) (?: \[ \d+ \] )? : \s* ( .+ $ ) 'x; if( $rest =~ m[warning: (?=.*Illegal address syntax)] ) { ++$log{ $src }{ $mode }{ 'warning: Illegal address syntax from + **** in MAIL command: ****' }; next; } if( $rest =~ m[warning: (?=.*non-SMTP command)] ) { ++$log{ $src }{ $mode }{ 'warning: **** non-SMPT command from +****' }; next; } $rest =~ s[ (?: [\w-]+ \. ){1,} [\w-]+][****]gx; ## Remove fqdn +s $rest =~ s[ [a-z] \w+ \d : ][****]gx; ## Server name +s? $rest =~ s[ [A-Z0-9]{11} : ][****]x; ## Queue names $rest =~ s[ < [^>]+ > ][****]x; ## Common form + of bad name ++$log{ $src }{ $mode }{ $rest }; }
The results from the larger datasets:
C:\test>619685 data\data.txt mail1-in.nyc.domain.com postfix/smtpd (5) warning: **** non-SMPT command from **** (2) warning: **** queue file size limit exceeded (2331) warning: ****: hostname **** verification failed: hostn +ame nor servname provided, or not known (1) warning: ****: hostname docsis4-97 verification failed: ho +stname nor servname provided, or not known (13) warning: Illegal address syntax from **** in MAIL command +: **** (2) warning: numeric hostname: **** (1) warning: valid_hostname: empty hostname mail1-in.sfc.domain.com postfix/smtpd (3) warning: **** non-SMPT command from **** (1508) warning: ****: hostname **** verification failed: hostn +ame nor servname provided, or not known (8) warning: Illegal address syntax from **** in MAIL command: + **** sendmail (480) File descriptors missing on startup: stderr; Bad file de +scriptor mail1-out.nyc.domain.com ntpd (169) sendto(****): Bad file descriptor postfix/smtp (2) warning: malformed domain name in resource data of MX reco +rd for ****: (32) warning: no MX host for **** has a valid address record (18) warning: numeric domain name in resource data of MX recor +d for ****: **** (2) warning: valid_hostname: empty hostname postfix/smtpd (7) warning: Illegal address syntax from **** in MAIL command: + **** sm-mta (2) **** SYSERR(root): ****. config error: mail loops back to +me (MX problem?) syslog-ng (1) Changing permissions on special file /dev/console mail2-in.nyc.domain.com postfix/smtpd (2) warning: **** non-SMPT command from **** (1494) warning: ****: hostname **** verification failed: hostn +ame nor servname provided, or not known (1) warning: ****: hostname t147235 verification failed: hostn +ame nor servname provided, or not known (14) warning: Illegal address syntax from **** in MAIL command +: **** (1) warning: valid_hostname: empty hostname mail2-in.sfc.domain.com postfix/smtpd (3) warning: **** non-SMPT command from **** (6) warning: **** queue file size limit exceeded (1716) warning: ****: hostname **** verification failed: hostn +ame nor servname provided, or not known (9) warning: Illegal address syntax from **** in MAIL command: + **** (3) warning: reject: ETRN ****... from ****[****] (1) warning: reject: ETRN [****]... from ****[****] (1) warning: valid_hostname: empty hostname mail2-out.nyc.domain.com ntpd (168) sendto(****): Bad file descriptor postfix/smtp (2) warning: malformed domain name in resource data of MX reco +rd for ****: (25) warning: numeric domain name in resource data of MX recor +d for ****: **** (2) warning: valid_hostname: empty hostname sm-mta (1) **** SYSERR(root): ****. config error: mail loops back to +me (MX problem?) syslog-ng (1) Changing permissions on special file /dev/console mail2-out.sfc.domain.com postfix/smtp (61) warning: malformed domain name in resource data of MX rec +ord for ****: (1) warning: no MX host for **** has a valid address record (61) warning: valid_hostname: empty hostname
In reply to Re^3: adaptive syslog message parsing
by BrowserUk
in thread adaptive syslog message parsing
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |