comment on

Since I am not graced with a scripting brain, and I'm not that great with Perl, I'm having a number of issues trying to report on (Postfix) mail that has originated from certain hosts in our domain destined for other specific domains. I have the regexps working (they're probably very ugly, but who cares), but I have problems associating the two message ids that each message has due to our antispam solution, while performing separate tests on them.

The line in the maillog that mentions both message ids is like this:

Jun  7 13:47:56 smtpserver postfix/smtp[16346]: A9507208022: to=<a.use
+r@somwhere.gov>, relay=localhost[127.0.0.1], delay=0, status=sent (25
+0 Ok: queued as B68C8208095)
[download]

The 11-12 character hex-looking strings are the message ids. In an earlier line of the maillog, the first message ID tells me which client/host that the message originated from:

Jun  7 13:47:56 smtpserver postfix/smtpd[12725]: A9507208022: client=o
+urhost1.our.domain[172.111.111.111]
[download]

The second message ID eventually shows up again in a maillog line which shows the message being accepted by the destination server:

Jun  7 13:47:57 smtpserver postfix/smtp[12379]: B68C8208095: to=<a.use
+r@Somewhere.gov>, relay=server.somwhere.gov[10.0.100.100], delay=1, s
+tatus=sent (250 ok:  Message 2156237 accepted)
[download]

If a message ID has not originated from the correct host, I want to discard both associated IDs. If it has originated from the correct hosts, I want to want to print the line that shows the delivery. I've gotten as far as figuring out I need to run through the maillog a few times to gather the MSGIDs, discard the ones from the incorrect hosts, and finally print the lines I want. I assume slurping each 75MB file might be a bit much (although I'd be happy to try it).

In short, how do I do the magic in the middle? Is the best thing to use a hash to hold both message IDs? Do I use the first message ID as the "key", and delete it if it's from the wrong client? ie:

foreach $line ( <FH> ) {                
   if ( $line !~ /to=<.*(our\.domain)/i) {    
# our external domain is a subset of the ones we're looking for
        if ($line =~ /to=<.*\.gov>.*relay=localhost/i) {
            @ids = ($line =~ /\ ([0-9A-F]{11,12})/g);
            %msgids{$ids[0]} = $ids[1];
        }
    }
}  

open (FH2, $file ) or  die "Cannot open file 2nd time: $!";
foreach $line ( <FH2> ) {
    if  ( $line !~ /client=(ourhost1|ourhost2)/ ) {
        foreach $keys (%msgids) {
            if ($key =~ /$line/) {
                delete $msgids $key;
            }
      }
}
# then do something to run through the log again and print
# the delivery report line from the second msgid
[download]

Is it better to do a while to run through the IDs doing the rest of the matching or the foreach or what? Or is it a TIMTOWDI situation?

TIA for sanity-checking and input.

In reply to Structuring maillog data - hopefully simple question on arrays/hashes by billie_t

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.