in reply to Re^3: MIME voodoo.
in thread MIME voodoo.

This works, and doesn't work - at the same time.

I think an explanation is due after a statement like that, so here goes:

Take this code:

#!/usr/bin/perl use Email::MIME; $file = shift; $which = shift; ############################## # $which is: # text = plain text portion # html = html portion ############################## local( $/, *FILE ) ; open(FILE, $file); $message = <FILE>; close(FILE); my $parsed = Email::MIME->new($message); if ($parsed->content_type =~ m{^multipart/alternative}) { print get_text_parts($parsed)->body; } sub get_text_parts { my @parts = shift->parts; my %ct; $ct{$_->content_type} = $_ for @parts; return $ct{'text/plain'} if exists $ct{'text/plain'}; return $ct{'text/html'} if exists $ct{'text/html'}; return $parts[0] if $which =~ /text/; return $parts[1] if $which =~ /html/; }

And also take this input:

Content-Type: multipart/alternative; boundary="_000_200907060005UAA14932pisas291mscom88clm_" MIME-Version: 1.0 --_000_200907060005UAA14932pisas291mscom88clm_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable line1 line2 line3 --_000_200907060005UAA14932pisas291mscom88clm_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <p>blah</p> <p>blah2</p> --_000_200907060005UAA14932pisas291mscom88clm_-- --_004_38D25DCAD7370B4FACA079E2FAA2C690B02CB5NYWEXMB24msadmsco_--

When you run it, the results are as follows (this is the "WORKS" part):

$ ./grab.pl mime4 text line1 line2 line3 $ ./grab.pl mime4 html <p>blah</p> <p>blah2</p> $

Now, what does NOT work is that as I said in my original request - sometimes the plain text mime can be on top, sometimes its on the bottom. using this code, if you switch the two mimes around it doesn't work. try switchign them around, and tell the code to give you the html portion. it'll give you the plain text portion instead.

Is there any way to specifically request a html or text portion (so it doesn't matter what order the MIMEs are in the input), to your knowledge?

Replies are listed 'Best First'.
Re^5: MIME voodoo.
by zwon (Abbot) on Jul 16, 2009 at 20:00 UTC
    if you switch the two mimes around it doesn't work

    Note my last comment for the example. There's a bug, as $_->content_type returns not just text/plain, but text/plain; charset=.... So you should replace

    $ct{$_->content_type} = $_ for @parts;
    with
    for (@parts) { (my $c = $_->content_type) =~ s/;.+//; $ct{$c} = $_; }

    but this solution thought would work for most messages isn't perfect either, as it is possible to have several text/plain parts with different charsets.

      Ah! I see now - you're getting rid of the semicolon and everything that follows, so you're only left with text/plain or text/html. I see. :)

      Any ideas why it only returns the plain text portion, and no html?

      sub get_text_parts { my @parts = shift->parts; my %ct; # $ct{$_->content_type} = $_ for @parts; for (@parts) { (my $c = $_->content_type) =~ s/;.+//; # print "\nDEBUG: $c\n"; $ct{$c} = $_; } return $ct{'text/plain'} if exists $ct{'text/plain'}; return $ct{'text/html'} if exists $ct{'text/html'}; return $parts[0] if $which =~ /text/; return $parts[1] if $which =~ /html/; }

      outputs:

      $ ./grab.pl mime4 html line1 line2 line3 $ ./grab.pl mime4 text line1 line2 line3 $

      I really appreciate you taking the time to help out, thanks a lot :)

        return $ct{'text/plain'} if exists $ct{'text/plain'}; return $ct{'text/html'} if exists $ct{'text/html'}; return $parts[0] if $which =~ /text/; return $parts[1] if $which =~ /html/;
        should be
        my $ct = $which eq 'html' ? 'text/html' : 'text/plain'; return $ct{$ct} if exists $ct{$ct}; # Fall back to plain text or HTML return $ct{'text/plain'} if exists $ct{'text/plain'}; return $ct{'text/html'} if exists $ct{'text/html'}; # Fall back to whatever's first return $parts[0];

        By the way,

        $ct{$_->content_type} = $_ for @parts;

        should be changed to

        $ct{$_->content_type} ||= $_ for @parts;

        to get the *first* part with the right content type, not the last.

        Any ideas why it only returns the plain text portion, and no html?

        Sure ;) Look at this, here are four returns and first always works:

        #this one returns you text/plain part return $ct{'text/plain'} if exists $ct{'text/plain'}; # and these are not executed it there's text/plain part return $ct{'text/html'} if exists $ct{'text/html'}; # these are not executed at all an they are not correct! # parts[0] may contain text/html and parts[1] text/plain return $parts[0] if $which =~ /text/; return $parts[1] if $which =~ /html/