Anonymous monk writes:
I posted a node on how to approach this a while back but it seemed the only response was to use modules - i have checked them out and I really have no idea how to incorporate them into myscript.
First things first:
The problem you're approaching is deceptively simple. The more email you encounter "in the wild," the more you'll find exception cases to even seemingly safe assumptions.
Learning how to incorporate and use modules is pretty much essential if this is to be anything more than a quick and dirty hack. The CPAN modules that handle email are outstanding, and they'll handle a wealth of bizzare conditions that you're almost guaranteed to overlook. I'll provide some pointers here, but I strongly suggest a long, meaningful session with The Camel and an effort to come up to speed on this stuff.
What I am trying to do is simply pipe an email to my script (which is done) and the script to extract the From: email and the Subject: and Message:
Right away, you have to consider that there may not BE any From: or Subject: line in your email, and have your design take this into account.
Also, there may be MORE THAN ONE of each of these lines, in which case you might make the design decision to treat the first one you come across as authoritative, but that's also up to you.
And in the case of the From: line, there may be mulitple addresses on it, formatted in multiple ways.
That's not even taking into account the differences between envelope (SMTP) FROM addresses and the RFC 2822 headers in the body of the mail, or fields like "Sender" or "Reply-to", also commonly seen in the mail.
Incidentally, if you're looking for a good, unique, "Key" field in the headers to use in your database, you might consider "Message-Id" - it should be unique on a per mail message basis. Of course, we all know you can get more than one copy of an email...
Finally, it looks like you're storing off the message body after you find the headers you're looking for - that's all well and good UNLESS the mail is a MIME formatted message (as so many are today), in which case you can't just discard the headers without loosing critical MIME headers.
So, with all those massive hurdles placed in your way, here's some code:
use MIME::Parser;
use Mail::Address;
my $parser = new MIME::Parser;
my $MIME_entity;
eval {
$MIME_entity = $parser->parse(\*STDIN);
}
if ($@) {
print "Trouble parsing mail:\n";
print $parser->results->msgs();
} else {
my $header = $MIME_entity->head();
my $subject = $header->get('Subject'); # Assumes one subject header
print "No Subject header found!\n" unless defined ($subject);
my $from = $header->get('From'); # Assumes one From header
print "No From header found!\n" unless defined ($from);
my @from_addresses = Mail::Address->parse($from);
my $address;
if (@from_addresses) {
# Assumes 1st address is the only one we care about
$address = $from_addresses[0]->address();
} else {
print "No address found in from line!\n";
}
print "Subject is $subject\nMail is from $address\n";
# See the MIME::Entity manpage for clues on how to muck about with
# the body of the email here
}
$parser->filer->purge();
Please note that I haven't tested this code, it's subject to the usual bugs, typos, etc. - I present it only as a pattern for you to build on.
I have seen examples of using modules MAIL and MIME and tried it but I simply dont know where to start ...
Here's some links.
The MIME module homepages:
http://www.zeegee.com/code/perl/MIME-tools/
Relevant RFC at the Internet Mail Consortium's site:
http://www.imc.org/rfc2822
Many Thanks
You're welcome. "Simple things ought to be simple" - email, as it turns out, is not simple. :-) Good hunting.
Peace,
-McD |