Hi Monks, Wisdom is what I seek

I Have been trying to process several different log formats with some success but the mail ones have me a little stuck

Here is what my source data looks like

May 2 07:06:20 lon.mail.net exim[1234]: 2012-05-02 07:06:20 1PSPtU-00 +04en-1e <= it_ndt_bounces@new.itunes.com H=smtpmail.com [21.5.10.4] I +=[8.4.14.4]:25 P=esmtp S=1966 id=1603882764.112965659.1335927964793.M +ail.cboxp@ednabay.apple.com T="New on iTunes: One Thing And, Then Ano +ther, Cooking Apps,\n Great Deals on First Seasons, and M" May 2 07:06:20 lon.mail.net exim[1234]: 2012-05-02 07:06:20 1PSPtU-00 +04en-1e <= it_ndt_bounces@new.itunes.com H=smtpmail.com [21.5.10.4] I +=[8.4.14.4]:25 P=esmtp S=1966 id=1603882764.112965659.1335927964793.M +ail.cboxp@ednabay.apple.com T="New on iTunes: One Thing And, Then Ano +ther, Cooking Apps,\n Great Deals on First Seasons, and M" May 2 07:06:20 lon.mail.net exim[1235]: 2012-05-02 07:06:20 1PSPtU-00 +04en-1e => peterpiper <peterpiper@nosuchdomain.net> R=local_mail T=lo +cal_maildir_mail_drop

I have code now which processes basic syslog type entries into a number of fields

#!/usr/bin/perl use strict; use warnings; no warnings q{uninitialized}; while (my $line = <STDIN>) { chomp($line); my ( $mon, $day, $time, $loghost, $prog, $remainder ) = split m{:?\s+}, $line, 6; my %monthNos = do { my $no = 0; map { $_ => ++ $no } qw{ Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec }; }; my ( $user ) = $remainder =~ m{user=([^,]+)}; my ( $rip ) = $remainder =~ m{rip=([^,]+)}; $remainder =~ tr/"/'/; my $yr = q{2012}; my $csv = sprintf q{%02d/%02d/%s %s,%s,%s,"%s",%s,%s}, $day, $monthNos{ $mon }, $yr, $time, $loghost, $prog, $remainder, $ +user, $rip; print "$csv\n"; }

My problem now is that it looks like in exim various fields mean different things depending on whether the string contains <=, =>, == or even **

Because my files contain potentially millions of lines I am looking for an efficient way of effectively saying

if contains <= then ...... else if contains => then ..... else if contains == then .... else if contains ** then .... else somethingelse etc

Plus any tips if the way I am doing this now could be made faster

Many Thanks IA

Steve


In reply to Process mail logs by stevbutt

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.