Krambambuli has asked for the wisdom of the Perl Monks concerning the following question:

I wonder what would be the recommended module[s] to extract the body from some incoming mail messages.

The 'Perl Cookbook' sec. edition recommends MIME::Parser for getting the attachments; however, I'm interested actually _only_ in the body of the main part, not caring about attachments.

So I thought about using good old Mail::Internet. Out of curiosity, I took a look on Email::Simple and MIME::Parser. Finding them potential interesting too, I went a step further, trying to benchmark them on a simple mail sample.

Which one would be the winner, what are your bets...?

Here the - at least for me - VERY surprising results:
Rate mime_parser email_simple mail_internet mime_parser 39.9/s -- -89% -99% email_simple 355/s 788% -- -91% mail_internet 3846/s 9531% 985%
And this is the code used:
#!/usr/bin/perl use strict; use warnings; use Benchmark qw( :all ); use Mail::Internet; use Email::Simple; use MIME::Parser; my @message; push (@message, $_) while <>; my $message = join( '', @message ); cmpthese( 1000, { 'mail_internet' => \&_mail_internet, 'email_simple' => \&_email_simple, 'mime_parser' => \&_mime_parser, } ); exit 0; sub _mail_internet { my $mail_internet = Mail::Internet->new( \@message ); my $body = join( '', @{ $mail_internet->body } ); } sub _email_simple { my $email_simple = Email::Simple->new($message); my $body = $email_simple->body; } sub _mime_parser { my $parser = new MIME::Parser; my $entity = $parser->parse_data($message); my $body_handle = $entity->bodyhandle; my $body = $body_handle->as_string }
The clear winner seems to be MIME::Parser - despite the fact that I'd have bet on Email::Simple before seeing the numbers.

So - what modules/ways to parse mail messages do other perlers use ?
Am I alone in being surprized by the above benchmarking results ?

Replies are listed 'Best First'.
Re: Parsing mail messages
by xdg (Monsignor) on Feb 17, 2007 at 13:45 UTC

    I think you're reading the benchmarks backwards -- Mail::Internet is about 100 times faster than Mime::Parser. Look at the rate column.

    For most tasks involving incoming emails "from the wild", I use Mail::Box (along with Mail::Box::Parser::C). It generally is not be the fastest, but it's designed to be very robust against the kinds of pathologically formatted emails that spammers send.

    -xdg

    Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

      Oops... I've definitely read it heads down. So Mail::Internet seems to rule, no big surprize.

      I'm dealing with registration forms coming from a bunch of different, but known sources/sites, so the messages are simple and not expected to contain anything weird.

      Many thanks!
Re: Parsing mail messages
by Anonymous Monk on Feb 17, 2007 at 13:00 UTC