MIME voodoo.

vxp has asked for the wisdom of the Perl Monks concerning the following question:

Suppose you have the following input (MIME encoded):

--_000_200907060005UAA14932pisas291mscom88clm_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

line1
line2
line3

--_000_200907060005UAA14932pisas291mscom88clm_
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
...insert random MIME-ed html code here

--_000_200907060005UAA14932pisas291mscom88clm_--

--_004_38D25DCAD7370B4FACA079E2FAA2C690B02CB5NYWEXMB24msadmsco_--
[download]

Another thing about this is that the input is not consistent in the sense of order. Sometimes the plain text mime can be on top and the html mime on the bottom, and sometimes it can be vice versa.

the reason I'm mentioning that, is because that makes it impossible to parse the message by reading the input from the top until you see "Content-Type: text/html", and placing it into a $txt_section for instance and then reading the file from the bottom (by File::ReadBackwards for instance) and placing that into a $html_section or something.

so I'm looking for a CPAN module, I'm assuming, that I can instruct to "get me the 'text/plain' section out of this input" or "place the 'text/html' section of that input into @array".

Any suggestions for such a module, or may be an alternate solution to this ?

Comment on MIME voodoo. Download Code

Replies are listed 'Best First'.
Re: MIME voodoo. by zwon (Abbot) on Jul 16, 2009 at 18:32 UTC
You can do it using Email::MIME, it will parse the message and return you list of parts found, then you should just check content-type for every part and get what you want, no voodoo. Also http://emailproject.perl.org/ may be interesting for you.	[reply]
Re^2: MIME voodoo. by vxp (Pilgrim) on Jul 16, 2009 at 18:56 UTC
Took a quick look, noticed something strange here's the input that I fed into it: `--_000_200907060005UAA14932pisas291mscom88clm_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable line1 line2 line3 --_000_200907060005UAA14932pisas291mscom88clm_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <p>blah</p> <p>blah2</p> --_000_200907060005UAA14932pisas291mscom88clm_-- --_004_38D25DCAD7370B4FACA079E2FAA2C690B02CB5NYWEXMB24msadmsco_--` [download] Here's the code I came up with: `#!/usr/bin/perl use Email::MIME; $file = shift; local( $/, *FILE ) ; open(FILE, $file); $message = <FILE>; close(FILE); my $parsed = Email::MIME->new($message); my @parts = $parsed->parts; # These will be Email::MIME objects, too. my $p = $parts[1]->content_type; print $p;` [download] That is pretty much from the module's documentation. Here's the result it produces, when I run it: $ ./grab.pl mime2 Can't call method "content_type" on an undefined value at ./grab.pl li +ne 16. $ [download] Shouldn't `$parts[1]` contain the second MIME in my input (html)? why's it undef'ed? and if I use @parts in scalar context, it claims there's only 1 mime part in that input, which is also a blatant lie. :) `$parts[0]` seems to contain what it's supposed to. it returns 'text/plain; charset="iso-8859-1"' as its supposed to. where'd `$parts[1]` go? :D Did I misinterpret the documentation somehow?	[reply] [d/l] [select]
Re^3: MIME voodoo. by zwon (Abbot) on Jul 16, 2009 at 19:05 UTC
Your input isn't correct MIME message. There should be header that defines boundary and it's missed. Here's some small and dirty (sorry...) example: use strict; use warnings; use 5.010; use MIME::Lite; use Email::MIME; my $msg = MIME::Lite->new( From => 'src@example.com', To => 'dst@example.com', Subject => 'message', Type => 'multipart/alternative', ); $msg->attach( Type => 'text/plain', Data => 'this is text content', ); $msg->attach( Type => 'text/html', Data => 'this is <b>html</b> content', ); my $msg_str = $msg->as_string; print $msg_str; # note the output here -- that's a complete message my $parsed = Email::MIME->new($msg_str); say "" x 50; if ($parsed->content_type =~ m{^multipart/alternative}) { say get_text_parts($parsed)->body; } sub get_text_parts { my @parts = shift->parts; my %ct; $ct{$_->content_type} = $_ for @parts; return $ct{'text/plain'} if exists $ct{'text/plain'}; return $ct{'text/html'} if exists $ct{'text/html'}; return $parts[0]; } [download] Upd:* minor fix Upd: Note also that `$_->content_type` may return something like `text/plain; charset=utf-8`, and this code will fail in this case.	[reply] [d/l] [select]
Re^4: MIME voodoo. by vxp (Pilgrim) on Jul 16, 2009 at 19:37 UTC
Re^5: MIME voodoo. by zwon (Abbot) on Jul 16, 2009 at 20:00 UTC
Some notes below your chosen depth have not been shown here