vxp has asked for the wisdom of the Perl Monks concerning the following question:

Suppose you have the following input (MIME encoded):

--_000_200907060005UAA14932pisas291mscom88clm_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable line1 line2 line3 --_000_200907060005UAA14932pisas291mscom88clm_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> <HTML> <HEAD> ...insert random MIME-ed html code here --_000_200907060005UAA14932pisas291mscom88clm_-- --_004_38D25DCAD7370B4FACA079E2FAA2C690B02CB5NYWEXMB24msadmsco_--

Another thing about this is that the input is not consistent in the sense of order. Sometimes the plain text mime can be on top and the html mime on the bottom, and sometimes it can be vice versa.

the reason I'm mentioning that, is because that makes it impossible to parse the message by reading the input from the top until you see "Content-Type: text/html", and placing it into a $txt_section for instance and then reading the file from the bottom (by File::ReadBackwards for instance) and placing that into a $html_section or something.

so I'm looking for a CPAN module, I'm assuming, that I can instruct to "get me the 'text/plain' section out of this input" or "place the 'text/html' section of that input into @array".

Any suggestions for such a module, or may be an alternate solution to this ?

Replies are listed 'Best First'.
Re: MIME voodoo.
by zwon (Abbot) on Jul 16, 2009 at 18:32 UTC

    You can do it using Email::MIME, it will parse the message and return you list of parts found, then you should just check content-type for every part and get what you want, no voodoo. Also http://emailproject.perl.org/ may be interesting for you.

      Took a quick look, noticed something strange

      here's the input that I fed into it:

      --_000_200907060005UAA14932pisas291mscom88clm_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable line1 line2 line3 --_000_200907060005UAA14932pisas291mscom88clm_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <p>blah</p> <p>blah2</p> --_000_200907060005UAA14932pisas291mscom88clm_-- --_004_38D25DCAD7370B4FACA079E2FAA2C690B02CB5NYWEXMB24msadmsco_--

      Here's the code I came up with:

      #!/usr/bin/perl use Email::MIME; $file = shift; local( $/, *FILE ) ; open(FILE, $file); $message = <FILE>; close(FILE); my $parsed = Email::MIME->new($message); my @parts = $parsed->parts; # These will be Email::MIME objects, too. my $p = $parts[1]->content_type; print $p;

      That is pretty much from the module's documentation. Here's the result it produces, when I run it:

      $ ./grab.pl mime2 Can't call method "content_type" on an undefined value at ./grab.pl li +ne 16. $

      Shouldn't $parts[1] contain the second MIME in my input (html)? why's it undef'ed? and if I use @parts in scalar context, it claims there's only 1 mime part in that input, which is also a blatant lie. :)

      $parts[0] seems to contain what it's supposed to. it returns 'text/plain; charset="iso-8859-1"' as its supposed to. where'd $parts[1] go? :D

      Did I misinterpret the documentation somehow?

        Your input isn't correct MIME message. There should be header that defines boundary and it's missed. Here's some small and dirty (sorry...) example:

        use strict; use warnings; use 5.010; use MIME::Lite; use Email::MIME; my $msg = MIME::Lite->new( From => 'src@example.com', To => 'dst@example.com', Subject => 'message', Type => 'multipart/alternative', ); $msg->attach( Type => 'text/plain', Data => 'this is text content', ); $msg->attach( Type => 'text/html', Data => 'this is <b>html</b> content', ); my $msg_str = $msg->as_string; print $msg_str; # note the output here -- that's a complete message my $parsed = Email::MIME->new($msg_str); say "*" x 50; if ($parsed->content_type =~ m{^multipart/alternative}) { say get_text_parts($parsed)->body; } sub get_text_parts { my @parts = shift->parts; my %ct; $ct{$_->content_type} = $_ for @parts; return $ct{'text/plain'} if exists $ct{'text/plain'}; return $ct{'text/html'} if exists $ct{'text/html'}; return $parts[0]; }

        Upd: minor fix

        Upd: Note also that $_->content_type may return something like text/plain; charset=utf-8, and this code will fail in this case.