Dear fellow monks,

After sending out a newsletter to subscribed readers, some emails or their hosts no longer exists. therefor i need a fast and quick parser for above messages not taking in account any "mailbox is full" or other garbage autoreply messages.

Limbic~Region pointed to Mail::MboxParser. It's a fast read only access to an unix mailbox and the module is easy to understand!

$msg->header->{from} or $msg->header->{to} is no use for the job because they both show the server where the newsletter was sent from. The body has to be parsed for unknown users or hosts. The print statements prints out the wrong email adress preceded with the corresponding message: user unknown or host unknown.

My use was slightly different, as i connected to the database for the unknown users in order to flag their emails as unsubscribed and left the unknown hosts for handcraft as shown here.

So here is the code :)

#!/usr/bin/perl -w use strict; use Mail::MboxParser; my $mbox = '/var/spool/mail/www37'; my $mb = Mail::MboxParser->new($mbox, decode => 'ALL'); print "Total messages: ", $mb->nmsgs, "\n"; # iterating through the mailbox while (my $msg = $mb->next_message) { my $body = $msg->body($msg->find_body); foreach my $line (split /\n/, $body) { my (undef, undef, $a) = split / /, $line if $line =~ / +Host unknown/i && $line=~/^550/; if ( defined $a ) { $a =~ s/^<(.+)>\.\.\.$/$1/; print "Host unknown: ", $a, "\n"; # ... } my $b = $line if $line =~ /User unknown/i && $line =~/ +^550/; if ( defined $b ) { $b =~ s/550 5.1.1 (.+) \(.+/$1/; $b =~ s/550 5.1.1 <(.+)>.+/$1/; $b =~ s/550 <(.+)>.+/$1/; print "User unknown 550: ", $b, "\n"; # ... } my $c = $line if $line =~ /^<<< 554/; if (defined $c) { $c =~ s/.+to (.+) cannot.+/$1/; $c =~ s/.+account \((.+)\) .+/$1/; print "User unknown 554: ", $c, "\n"; #.. } } }

again cpan is a great help and this job was fast and fun
please point out any inaccuracy! thx maksl
Perhaps I was a bit fast looking only at the 550 error, leaving out any 551 or others ..
View the regexes pointing to 550 only as proposal. There were also some obscure qmail answers, which even as human i didn't understand ;) ..

Update:
Important, because of the many existant yahoo email accounts: Added lines with variable $c for 554 error: yahoo likes to answer with this error .. sample: <<< 554 delivery error: dd Sorry your message to foo@yahoo.com cannot be delivered. This account has been disabled or discontinued [#102]. - mta210.mail.scd.yahoo.com


In reply to Parse a Unix Mailbox for unknown users or hosts by maksl

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.