comment on

Anonymous monk writes:

I posted a node on how to approach this a while back but it seemed the only response was to use modules - i have checked them out and I really have no idea how to incorporate them into myscript.

First things first:

The problem you're approaching is deceptively simple. The more email you encounter "in the wild," the more you'll find exception cases to even seemingly safe assumptions.

Learning how to incorporate and use modules is pretty much essential if this is to be anything more than a quick and dirty hack. The CPAN modules that handle email are outstanding, and they'll handle a wealth of bizzare conditions that you're almost guaranteed to overlook. I'll provide some pointers here, but I strongly suggest a long, meaningful session with The Camel and an effort to come up to speed on this stuff.

What I am trying to do is simply pipe an email to my script (which is done) and the script to extract the From: email and the Subject: and Message:

Right away, you have to consider that there may not BE any From: or Subject: line in your email, and have your design take this into account.

Also, there may be MORE THAN ONE of each of these lines, in which case you might make the design decision to treat the first one you come across as authoritative, but that's also up to you.

And in the case of the From: line, there may be mulitple addresses on it, formatted in multiple ways.

That's not even taking into account the differences between envelope (SMTP) FROM addresses and the RFC 2822 headers in the body of the mail, or fields like "Sender" or "Reply-to", also commonly seen in the mail.

Incidentally, if you're looking for a good, unique, "Key" field in the headers to use in your database, you might consider "Message-Id" - it should be unique on a per mail message basis. Of course, we all know you can get more than one copy of an email...

Finally, it looks like you're storing off the message body after you find the headers you're looking for - that's all well and good UNLESS the mail is a MIME formatted message (as so many are today), in which case you can't just discard the headers without loosing critical MIME headers.

So, with all those massive hurdles placed in your way, here's some code:

use MIME::Parser;
use Mail::Address;

my $parser = new MIME::Parser;
my $MIME_entity;

eval {
  $MIME_entity = $parser->parse(\*STDIN);
}

if ($@) {
  print "Trouble parsing mail:\n";
  print $parser->results->msgs();

} else {
  my $header = $MIME_entity->head();

  my $subject = $header->get('Subject'); # Assumes one subject header
  print "No Subject header found!\n" unless defined ($subject);
  
  my $from = $header->get('From'); # Assumes one From header
  print "No From header found!\n" unless defined ($from);
  
  my @from_addresses = Mail::Address->parse($from);
  my $address;
  if (@from_addresses) {
    # Assumes 1st address is the only one we care about
    $address = $from_addresses[0]->address();
  } else {
    print "No address found in from line!\n";
  }

  print "Subject is $subject\nMail is from $address\n";

  # See the MIME::Entity manpage for clues on how to muck about with
  # the body of the email here

}

$parser->filer->purge();
[download]

Please note that I haven't tested this code, it's subject to the usual bugs, typos, etc. - I present it only as a pattern for you to build on.

I have seen examples of using modules MAIL and MIME and tried it but I simply dont know where to start ...

Here's some links.

The MIME module homepages:
http://www.zeegee.com/code/perl/MIME-tools/

Relevant RFC at the Internet Mail Consortium's site:
http://www.imc.org/rfc2822

Many Thanks

You're welcome. "Simple things ought to be simple" - email, as it turns out, is not simple. :-) Good hunting.

Peace,
-McD

In reply to Re: Email Header by McD
in thread Email Header by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.