Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I am writing a script which reads an email in (via STDIN) and sticks it in a database - I posted a node on how to approach this a while back but it seemed the only response was to use modules - i have checked them out and I really have no idea how to incorporate them into myscript. What I am trying to do is simply pipe an email to my script (which is done) and the script to extract the From: email and the Subject: and Message:

I am having problems though, I am using the following code to get the From: field, but, when the from feild = something like JOHN <ad@add.com> it fails ... as I only want the eail address, not the from 'name' .. If anyone can help me with any suggestions I would greatly appreciate it - I have seen examples of using modules MAIL and MIME and tried it but I simply dont know where to start ... here is my code. Many Thanks
while (<STDIN>) { last if /^$/; $from = "$1" if /^From:(.+)$/; $subject = "$1" if /^Subject:(.+)$/; } while (<STDIN>) { push @message, $_; }

Edit Masem 2001-11-13 - Fixed the example address in the text descriptive with CODE tags

Replies are listed 'Best First'.
Re: Email Header
by McD (Chaplain) on Nov 13, 2001 at 22:12 UTC
    Anonymous monk writes:

    I posted a node on how to approach this a while back but it seemed the only response was to use modules - i have checked them out and I really have no idea how to incorporate them into myscript.

    First things first:

    The problem you're approaching is deceptively simple. The more email you encounter "in the wild," the more you'll find exception cases to even seemingly safe assumptions.

    Learning how to incorporate and use modules is pretty much essential if this is to be anything more than a quick and dirty hack. The CPAN modules that handle email are outstanding, and they'll handle a wealth of bizzare conditions that you're almost guaranteed to overlook. I'll provide some pointers here, but I strongly suggest a long, meaningful session with The Camel and an effort to come up to speed on this stuff.

    What I am trying to do is simply pipe an email to my script (which is done) and the script to extract the From: email and the Subject: and Message:

    Right away, you have to consider that there may not BE any From: or Subject: line in your email, and have your design take this into account.

    Also, there may be MORE THAN ONE of each of these lines, in which case you might make the design decision to treat the first one you come across as authoritative, but that's also up to you.

    And in the case of the From: line, there may be mulitple addresses on it, formatted in multiple ways.

    That's not even taking into account the differences between envelope (SMTP) FROM addresses and the RFC 2822 headers in the body of the mail, or fields like "Sender" or "Reply-to", also commonly seen in the mail.

    Incidentally, if you're looking for a good, unique, "Key" field in the headers to use in your database, you might consider "Message-Id" - it should be unique on a per mail message basis. Of course, we all know you can get more than one copy of an email...

    Finally, it looks like you're storing off the message body after you find the headers you're looking for - that's all well and good UNLESS the mail is a MIME formatted message (as so many are today), in which case you can't just discard the headers without loosing critical MIME headers.

    So, with all those massive hurdles placed in your way, here's some code:

    use MIME::Parser; use Mail::Address; my $parser = new MIME::Parser; my $MIME_entity; eval { $MIME_entity = $parser->parse(\*STDIN); } if ($@) { print "Trouble parsing mail:\n"; print $parser->results->msgs(); } else { my $header = $MIME_entity->head(); my $subject = $header->get('Subject'); # Assumes one subject header print "No Subject header found!\n" unless defined ($subject); my $from = $header->get('From'); # Assumes one From header print "No From header found!\n" unless defined ($from); my @from_addresses = Mail::Address->parse($from); my $address; if (@from_addresses) { # Assumes 1st address is the only one we care about $address = $from_addresses[0]->address(); } else { print "No address found in from line!\n"; } print "Subject is $subject\nMail is from $address\n"; # See the MIME::Entity manpage for clues on how to muck about with # the body of the email here } $parser->filer->purge();
    Please note that I haven't tested this code, it's subject to the usual bugs, typos, etc. - I present it only as a pattern for you to build on.

    I have seen examples of using modules MAIL and MIME and tried it but I simply dont know where to start ...

    Here's some links.

    The MIME module homepages:
    http://www.zeegee.com/code/perl/MIME-tools/

    Relevant RFC at the Internet Mail Consortium's site:
    http://www.imc.org/rfc2822

    Many Thanks

    You're welcome. "Simple things ought to be simple" - email, as it turns out, is not simple. :-) Good hunting.

    Peace,
    -McD

      Thanks McD, thats very helpfull and is really appreciated.

      For getting the body message at the bottom there would I use something like..
          $bodyh = $ent->bodyhandle;
      Thanks again
        I was just installing the module MIME::Parser but I get an error saying its not on cpan :(
Re: Email Header
by mischief (Hermit) on Nov 13, 2001 at 19:44 UTC
    Try this to grab the email addresss from $from as $email:

    my ($email) = $from =~ /<([^>]+)>/ if $1;

    Might be an idea to have a look at the Mail::Header module.

    By the way, and this isn't relevant to your question, but you don't need to quote $1 (eg, $from = "$1"... in your example above.)

      Thanks for your suggestions, for some reason that last one only returns a blank field
        What's the value of $from?
Re: Email Header
by Anonymous Monk on Nov 13, 2001 at 21:18 UTC
    Here's some code that I use to do what you're looking for:
    if ( $line =~ /^From: (.+?)$/i ) { $newFrom = $1; $newFrom = $1 if $newFrom =~ /<(\S+)>/; }
      Thanks that last one got the from email address which is great, but now I have two other small problems :) getting the subject and the actual message - here is my code (the @message tags picks up all the MIME- lines)
      while (<STDIN>) { last if /^$/; if ( $_ =~ /^From: (.+?)$/i ) { $newFrom = $1; $newFrom = $1 if $newFrom =~ /<(\S+)>/; } $subject = "$1" if /^Subject:(.+)$/; } while (<STDIN>) { push @message, $_; }
Re: Email Header
by Gerryjun (Scribe) on Nov 13, 2001 at 18:29 UTC
    Try using this instead if it helps.
    read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); @pairs = split(/&/, $buffer); foreach $pair (@pairs) { ($names, $value) = split(/=/, $pair); $value =~ tr/+/ /; $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; if ($INPUT{$names}) { $INPUT{$names} = $INPUT{$names}.",".$val +ue; } else { $INPUT{$names} = $value; } }
      That's a rather broken CGI-parameter POST reader, not an RFC 822 email message header parser. It probably won't help. :)

      On a side note, as that snippet has several flaws visible in ten seconds (does not verify the content length, is subject to large POST denial of service attacks, does not appear to handle properly encoded entities, ignores the valid ';' separator for form values, and does not allow '0' as a valid form value), I would advise against using it even when accepting CGI parameters. use CGI or die; offers a better suggestion.