Trying to correctly parse MIME

Interwebs has asked for the wisdom of the Perl Monks concerning the following question:

Hello Gracious Monks,

I have been tearing out my hair for several days now trying to write a perl script that will take a MIME email, parse the header, parse the parts, decode(if necessary) and stick the parts that are text/plain into $body, and decode(if necessary) and strip tags from the parts that are text/html, then also placing them into $body. (This will lead to its actual function down the line.

The problem is, MIME::Decoder seems unable to parse my messages, and it claims that the message does not have separate parts. It also doesn't parse my header correctly, but that's not a big deal, as Email::MIME does.

Here is what I wrote so far (some code was grabbed from other perlmonks nodes):

#!/usr/local/bin/perl

use MIME::Parser;
use MIME::Entity;
use MIME::Body;

my (@body, $i, $subentity);
my $parser = new MIME::Parser;

$parser->output_to_core(1);

my $entity = $parser->parse_open("C:\\Documents and Settings\\user\\My
+ Documents\\email.txt");

%request = ();
my $header = $entity->head;
$request{SUBJECT}     = $header->get('Subject');        chomp $request
+{SUBJECT};
$request{TO}         = $header->get('To');            chomp $request{T
+O};
$request{FROM}         = $header->get('From');            chomp $reque
+st{FROM};
$request{CC}        = $header->get('CC');            chomp $request{CC
+};

if ($entity->parts > 0){
    for ($i=0; $i<$entity->parts; $i++){
        $subentity = $entity->parts($i);
        if (($subentity->mime_type =~ m/text\/html/i) || ($subentity->
+mime_type =~ m/text\/plain/i)){
            $body = join "",  @{$subentity->body};
            next;
        }
        
        #Multipart/Alternative
        elsif ($subentity->mime_type =~ m/multipart\/alternative/i){
            $body = join "",  @{$subentity->body};
            
            @body = split /------=_NextPart_\S*\n/, $body;
            
            $body = $body[1]; 
            
            $body =~ s/^Content-Type.*Content-Transfer-Encoding.*?\n+/
+/is;
            next;
        }
    } 
}
else {
print("No body text.");
}

$body = &untag($body);

#remove extra space
$body =~ tr/ //s;
$body =~ s/\n+ +\n//g;
$body =~ s/\n\n{2,}//g;

print($body);

sub untag {
  local $_ = $_[0] || $_;
# ALGORITHM:
#   find < ,
#       comment <!-- ... -->,
#       or comment <? ... ?> ,
#       or one of the start tags which require correspond
#           end tag plus all to end tag
#       or if \s or ="
#           then skip to next "
#           else [^>]
#   >
  s{
    <               # open tag
    (?:             # open group (A)
      (!--) |       #   comment (1) or
      (\?) |        #   another comment (2) or
      (?i:          #   open group (B) for /i
        ( TITLE  |  #     one of start tags
          SCRIPT |  #     for which
          APPLET |  #     must be skipped
          OBJECT |  #     all content
          STYLE     #     to correspond
        )           #     end tag (3)
      ) |           #   close group (B), or
      ([!/A-Za-z])  #   one of these chars, remember in (4)
    )               # close group (A)
    (?(4)           # if previous case is (4)
      (?:           #   open group (C)
        (?!         #     and next is not : (D)
          [\s=]     #       \s or "="
          ["`']     #       with open quotes
        )           #     close (D)
        [^>] |      #     and not close tag or
        [\s=]       #     \s or "=" with
        `[^`]*` |   #     something in quotes ` or
        [\s=]       #     \s or "=" with
        '[^']*' |   #     something in quotes ' or
        [\s=]       #     \s or "=" with
        "[^"]*"     #     something in quotes "
      )*            #   repeat (C) 0 or more times
    |               # else (if previous case is not (4))
      .*?           #   minimum of any chars
    )               # end if previous char is (4)
    (?(1)           # if comment (1)
      (?<=--)       #   wait for "--"
    )               # end if comment (1)
    (?(2)           # if another comment (2)
      (?<=\?)       #   wait for "?"
    )               # end if another comment (2)
    (?(3)           # if one of tags-containers (3)
      </            #   wait for end
      (?i:\3)       #   of this tag
      (?:\s[^>]*)?  #   skip junk to ">"
    )               # end if (3)
    >               # tag closed
   }{}gsx;          # STRIP THIS TAG
  return $_ ? $_ : "";
}
[download]

An email I'd parse might look like this:

           
MIME-Version: 1.0
Received: by xx.xxx.xxx.xxx with HTTP; Mon, 28 Jun 2010 11:16:13 -0700
+ (PDT)
X-Goomoji-Body: true
Date: Mon, 28 Jun 2010 14:16:13 -0400
Delivered-To: xxxxxx@gmail.com
Message-ID: <xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx@mail.gmail.com>
Subject: dfgyhh
From: Foo Bar <xxxxxxx@gmail.com>
To: xxxxxxx@gmail.com
Content-Type: multipart/mixed; boundary=0016e646072eab7f26048a1b1d96

--0016e646072eab7f26048a1b1d96
Content-Type: multipart/related; boundary=0016e646072eab7f24048a1b1d95

--0016e646072eab7f24048a1b1d95
Content-Type: multipart/alternative; boundary=0016e646072eab7f20048a1b
+1d94

--0016e646072eab7f20048a1b1d94
Content-Type: text/plain; charset=ISO-8859-1

*This is totally obnoxious HTML EMAIL! I'm writing from GMAIL* [?][?][
+?] AND IT
MAKES *ME Want to throw up :(*
*Look at my rad alignment*

Lookie <www.yahoo.com>

--0016e646072eab7f20048a1b1d94
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<b>This is totally obnoxious HTML EMAIL! I&#39;m writing from <span st
+yle=
=3D"background-color: rgb(255, 102, 0);">GMAIL</span></b> <img goomoji
+=3D"4=
EF" style=3D"margin: 0pt 0.2ex; vertical-align: middle;" src=3D"cid:4E
+F@goo=
moji.gmail"><img goomoji=3D"32F" style=3D"margin: 0pt 0.2ex; vertical-
+align=
: middle;" src=3D"cid:32F@goomoji.gmail"><img goomoji=3D"1B2" style=3D
+"marg=
in: 0pt 0.2ex; vertical-align: middle;" src=3D"cid:1B2@goomoji.gmail">
+ <fon=
t size=3D"6">AND IT MAKES <u>ME <font face=3D"georgia,serif">Want to t
+hrow =
up :(</font></u></font> <br>
<div style=3D"text-align: right;"><font face=3D"arial black,sans-serif
+"><u>=
<font size=3D"2">Look at my rad alignment</font></u></font><br><br><a 
+href=
=3D"http://www.yahoo.com">Lookie</a><br><br>
</div>

--0016e646072eab7f20048a1b1d94--
--0016e646072eab7f24048a1b1d95
Content-Type: image/gif; name="4EF.gif"
Content-Transfer-Encoding: base64
X-Attachment-Id: 4EF@goomoji.gmail
Content-ID: <4EF@goomoji.gmail>

R0lGODlhDwAPAJEDAABGdACJ46PQ7QAAACH5BAEAAAMALAAAAAAPAA8AAAI4nI+ZwKwHhJ
+TgMdGC
gxFPEGhV54UBZl6acLIRKLosu8JqlL6hWl9wnPqYREPRbzdoKJeVh/OpKAAAOw==
--0016e646072eab7f24048a1b1d95
Content-Type: image/gif; name="32F.gif"
Content-Transfer-Encoding: base64
X-Attachment-Id: 32F@goomoji.gmail
Content-ID: <32F@goomoji.gmail>

R0lGODlhDAAMAKIGAIoAAf/v7/+goAAAAP/X1/9YWP///wAAACH5BAEAAAYALAAAAAAMAA
+wAAAMt
aLXcGjDKFcgg+GJq9dCUII4isYiFkK4q66JFQJJmjAE3Vlc3rnO6oEnRKBoSADs=
--0016e646072eab7f24048a1b1d95
Content-Type: image/gif; name="1B2.gif"
Content-Transfer-Encoding: base64
X-Attachment-Id: 1B2@goomoji.gmail
Content-ID: <1B2@goomoji.gmail>

R0lGODlhEAAMAKIGAOoDA4oAAf/v7wAAAP9YWP/IyAAAAAAAACH5BAEAAAYALAAAAAAQAA
+wAAAM9
SLbU3ssQ4IS91oEAgChgGHaBEQRCIILqGRHfIMqF8sb0QNvMt4Y8ie9XiwgLqlAyCGOdks
+UbERSU
PK6RBAA7
--0016e646072eab7f24048a1b1d95--
--0016e646072eab7f26048a1b1d96
Content-Type: application/pdf; name="something.pdf"
Content-Disposition: attachment; filename="something.pdf"
Content-Transfer-Encoding: base64
X-Attachment-Id: f_gazmqs6a1
[REALLY LONG ATTACHMENT]
--0016e646072eab7f26048a1b1d96--
[download]

I know it's not pretty, but please bear with me, I'm fairly new to perl. My question is this: what's the best approach to this task? Am I way off? Did I make a stupid mistake? Thanks, Nick

Comment on Trying to correctly parse MIME Select or Download Code