in reply to MIME voodoo.

You can do it using Email::MIME, it will parse the message and return you list of parts found, then you should just check content-type for every part and get what you want, no voodoo. Also http://emailproject.perl.org/ may be interesting for you.

Replies are listed 'Best First'.
Re^2: MIME voodoo.
by vxp (Pilgrim) on Jul 16, 2009 at 18:56 UTC

    Took a quick look, noticed something strange

    here's the input that I fed into it:

    --_000_200907060005UAA14932pisas291mscom88clm_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable line1 line2 line3 --_000_200907060005UAA14932pisas291mscom88clm_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <p>blah</p> <p>blah2</p> --_000_200907060005UAA14932pisas291mscom88clm_-- --_004_38D25DCAD7370B4FACA079E2FAA2C690B02CB5NYWEXMB24msadmsco_--

    Here's the code I came up with:

    #!/usr/bin/perl use Email::MIME; $file = shift; local( $/, *FILE ) ; open(FILE, $file); $message = <FILE>; close(FILE); my $parsed = Email::MIME->new($message); my @parts = $parsed->parts; # These will be Email::MIME objects, too. my $p = $parts[1]->content_type; print $p;

    That is pretty much from the module's documentation. Here's the result it produces, when I run it:

    $ ./grab.pl mime2 Can't call method "content_type" on an undefined value at ./grab.pl li +ne 16. $

    Shouldn't $parts[1] contain the second MIME in my input (html)? why's it undef'ed? and if I use @parts in scalar context, it claims there's only 1 mime part in that input, which is also a blatant lie. :)

    $parts[0] seems to contain what it's supposed to. it returns 'text/plain; charset="iso-8859-1"' as its supposed to. where'd $parts[1] go? :D

    Did I misinterpret the documentation somehow?

      Your input isn't correct MIME message. There should be header that defines boundary and it's missed. Here's some small and dirty (sorry...) example:

      use strict; use warnings; use 5.010; use MIME::Lite; use Email::MIME; my $msg = MIME::Lite->new( From => 'src@example.com', To => 'dst@example.com', Subject => 'message', Type => 'multipart/alternative', ); $msg->attach( Type => 'text/plain', Data => 'this is text content', ); $msg->attach( Type => 'text/html', Data => 'this is <b>html</b> content', ); my $msg_str = $msg->as_string; print $msg_str; # note the output here -- that's a complete message my $parsed = Email::MIME->new($msg_str); say "*" x 50; if ($parsed->content_type =~ m{^multipart/alternative}) { say get_text_parts($parsed)->body; } sub get_text_parts { my @parts = shift->parts; my %ct; $ct{$_->content_type} = $_ for @parts; return $ct{'text/plain'} if exists $ct{'text/plain'}; return $ct{'text/html'} if exists $ct{'text/html'}; return $parts[0]; }

      Upd: minor fix

      Upd: Note also that $_->content_type may return something like text/plain; charset=utf-8, and this code will fail in this case.

        This works, and doesn't work - at the same time.

        I think an explanation is due after a statement like that, so here goes:

        Take this code:

        #!/usr/bin/perl use Email::MIME; $file = shift; $which = shift; ############################## # $which is: # text = plain text portion # html = html portion ############################## local( $/, *FILE ) ; open(FILE, $file); $message = <FILE>; close(FILE); my $parsed = Email::MIME->new($message); if ($parsed->content_type =~ m{^multipart/alternative}) { print get_text_parts($parsed)->body; } sub get_text_parts { my @parts = shift->parts; my %ct; $ct{$_->content_type} = $_ for @parts; return $ct{'text/plain'} if exists $ct{'text/plain'}; return $ct{'text/html'} if exists $ct{'text/html'}; return $parts[0] if $which =~ /text/; return $parts[1] if $which =~ /html/; }

        And also take this input:

        Content-Type: multipart/alternative; boundary="_000_200907060005UAA14932pisas291mscom88clm_" MIME-Version: 1.0 --_000_200907060005UAA14932pisas291mscom88clm_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable line1 line2 line3 --_000_200907060005UAA14932pisas291mscom88clm_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <p>blah</p> <p>blah2</p> --_000_200907060005UAA14932pisas291mscom88clm_-- --_004_38D25DCAD7370B4FACA079E2FAA2C690B02CB5NYWEXMB24msadmsco_--

        When you run it, the results are as follows (this is the "WORKS" part):

        $ ./grab.pl mime4 text line1 line2 line3 $ ./grab.pl mime4 html <p>blah</p> <p>blah2</p> $

        Now, what does NOT work is that as I said in my original request - sometimes the plain text mime can be on top, sometimes its on the bottom. using this code, if you switch the two mimes around it doesn't work. try switchign them around, and tell the code to give you the html portion. it'll give you the plain text portion instead.

        Is there any way to specifically request a html or text portion (so it doesn't matter what order the MIMEs are in the input), to your knowledge?