cajun has asked for the wisdom of the Perl Monks concerning the following question:

Being a sysadmin a very large percentage of what I do with perl is parsing log files to make some sort of report. While I've been sucessfully doing this, I'm wondering how others might accomplish this task. I'm reasonably sure my methods are not the best. So, I'm looking for a better way to accomplish the task.

Jul 6 14:36:41 moe postfix/smtp[15107]: A73DC113B63: to=<oetiker@conc +entric.com>, relay=adamant.concentric.com[207.155.248.168], delay=17, + status=bounced (host adamant.concentric.com[207.155.248.168] said: 5 +54 <oetiker@concentric.com>: Recipient address rejected: Unknown or i +nvalid user oetiker@concentric.com (in reply to RCPT TO command))

Using the example above, let's say the data that I'm interested in is "A73DC113B63, 554, oetiker@concentric.com".

One way I can get this is:

while(<FILE>){ chomp; my @text=split /:/, $_; warn Dumper(@text); }
My new friend Data::Dumper(::Simple) is your friend tells me:
@text = ( 'Jul 6 14', '36', '41 moe postfix/smtp[15107]', ' A73DC113B63', ' to=<oetiker@concentric.com>, relay=adamant.concentric.com[ +207.155.248.168], delay=17, status=bounced (host adamant.concentric.c +om[207.155.248.168] said', ' 554 <oetiker@concentric.com>', ' Recipient address rejected', ' Unknown or invalid user oetiker@concentric.com (in reply t +o RCPT TO command))' );
Fine (almost). My desired information is in:
$text[3] and $text[5] print "($text[3])\t($text[5])\n";
Thanks itub I was paying attention Re: Data::Dumper(::Simple) is your friend
This tells me: ( A73DC113B63)  ( 554 <oetiker@concentric.com>)

Using this method I still have to remove the space in $text[3] AND do another split on $text[5], then remove the < > before my data is really what I'm looking for. And perhaps had I split differently to start with I would have better results. (I think I can see merlyn banging his head on the monitor now.)

Ok, I know what you guys and gals who do this everyday are thinking (maybe). Why didn't you use a regex and just pull out only the pieces of data that you wanted? Simple, my regex talents still suck Regex Tagging (newbie).

Perhaps the answer I'm going to get is something like, "learn to use regex and simplify your life" (hmmmmm???). Maybe not. This is the reason I'm asking. I want to know if there is a better, easier, more efficient way to accomplish the task.

Thanks,
Mike

Update: Thanks ikegami. I had been experimenting with the split since posting and was actually going to update that 'split /: /' would have been a better choice. But your way is likely better.

Thanks Kanji What I'm currently working on is customizing some SpamAssassin rules. Pflogsumm just won't give me the type of information I really need to do this.

Replies are listed 'Best First'.
Re: Parsing log files (still)
by GrandFather (Saint) on Jul 06, 2005 at 21:54 UTC

    with a regex it could look like this:

    use strict; use warnings; while (<DATA>) { print "email $2, $3, code $1.\n" if /]: ([a-f0-9]+):.*?said: (\d+) < +(.+?)>/i; }
    Update:Give the answer OP wanted :-)

    Perl is Huffman encoded by design.
Re: Parsing log files (still)
by ikegami (Patriarch) on Jul 06, 2005 at 21:44 UTC

    Splitting on /:\s*/ will save you some work. Actually, let's split on /:\s+/ to avoid needlessly splitting up the date.

    We have two ways of looking at item 3 (formerly item 5): 1) It's either a space seperated list, and brackets need to be removed from the second (split approach), or 2) it's a string from which two substrings should be extractded (regexp approach). Since I'd use a regexp to remove the brackets, I might as well use a regexp for the whole thing.

    I also added two sanity checks, in case the line doesn't appear as we think.

    while (<DATA>) { chomp; my @parts = split /:\s+/, $_; next if $#parts < 3; my @parts_of_3 = $parts[3] =~ /^(\d+) <(.*)>$/; next unless @parts_of_3; my @result = ($parts[1], @parts_of_3); print(join(', ', @result), "\n"); } __DATA__ Jul 6 14:36:41 moe postfix/smtp[15107]: A73DC113B63: to=<oetiker@conc +entric.com>, relay=adamant.concentric.com[207.155.248.168], delay=17, + status=bounced (host adamant.concentric.com[207.155.248.168] said: 5 +54 <oetiker@concentric.com>: Recipient address rejected: Unknown or i +nvalid user oetiker@concentric.com (in reply to RCPT TO command))

      I'd personally use regexp method that GrandFather suggests because it gives me just the data I'm after with the added benefit of (basic) sanity checks, but if you're going to go the split method, using a more complex pattern avoids the need for post-split massaging...

      my @result = split /(: |.<|>,)/; my($qid,$email,$status) = @result[2,6,10]; # or my($qid,$email,$status) = (split /(: |.<|>,)/)[2,6,10];

      And depending on the extent of your Postfix-log parsing needs, you may be able to save yourself some effort by using or bastardizing pflogsum (assuming you don't find the code too hairy :-)).

          --k.