I have been able to transform a mime message into a mime entity and get what i want from the header. The problem is, what do i now do with the one or more parts which result? I am trying to get just the raw terms from an email. How do i take a set or mime parts and combine just their contents from the entity structure? here is a sample email file which i am inputting via stdin:
From armoraareoo@t-dialin.net Sun Apr 8 16:11:45 2007 Return-Path: <armoraareoo@t-dialin.net> Received: from plg2.math.uwaterloo.ca (plg2.math.uwaterloo.ca [129.97. +186.80]) by speedy.uwaterloo.ca (8.12.8/8.12.5) with ESMTP id l38KBj0I00482 +7 for <theplg@speedy.uwaterloo.ca>; Sun, 8 Apr 2007 16:11:45 -0400 Received: from t-dialin.net (p508ee6ed.dip.t-dialin.net [80.142.230.23 +7]) by plg2.math.uwaterloo.ca (8.13.8/8.13.8) with SMTP id l38KAt7e009 +862; Sun, 8 Apr 2007 16:11:01 -0400 (EDT) Message-ID: <2fee01c779eb$fb400220$c15f4e5d@armoraareoo> From: "Drew" <armoraareoo@t-dialin.net> To: "Lynsey Harvey" <dmason@plg2.math.uwaterloo.ca> Cc: "Dorcas" <migod@plg2.math.uwaterloo.ca>, "Misty" <holt@plg2.math.uwaterloo.ca>, "Rosalia" <dsvetinovic@plg2.math.uwaterloo.ca>, "Bart Shaw" <y5guo@plg2.math.uwaterloo.ca>, "Alexia Myers" <the00@plg2.math.uwaterloo.ca>, "Lona Gomez" <adtrevors@plg2.math.uwaterloo.ca>, "Caridad Sims" <elterra@plg2.math.uwaterloo.ca> Subject: How r u lately Date: Sun, 08 Apr 2007 14:41:24 -0500 MIME-Version: 1.0 Content-Type: multipart/related; type="multipart/alternative"; boundary="----=_NextPart_DAF_AA5B_FF2BEB78.AF9733DB" X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2462.0000 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2462.0000 X-Miltered: at mailchk-m02 with ID 46194C50.000 by Joe's j-chkmail (ht +tp://j-chkmail.ensmp.fr)! X-Virus-Scanned: ClamAV version 0.90.1, clamav-milter version 0.90.1 o +n localhost X-Virus-Status: Clean X-UUID: 3e328b2a-cdb4-49f8-94ce-feeb89b85d5d Status: O Content-Length: 21559 Lines: 322 This is a multi-part message in MIME format. ------=_NextPart_DAF_AA5B_FF2BEB78.AF9733DB Content-Type: multipart/alternative; boundary="----=_NextPart_CA0_4C28_95CE35A4.E636E095" ------=_NextPart_CA0_4C28_95CE35A4.E636E095 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable part one of the document ------=_NextPart_CA0_4C28_95CE35A4.E636E095 Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable same document... ------=_NextPart_CA0_4C28_95CE35A4.E636E095-- ------=_NextPart_DAF_AA5B_FF2BEB78.AF9733DB Content-Type: image/gif; name="sumorg.gif" Content-Transfer-Encoding: base64 Content-ID: <5627001c779eb7fbaa0e902503734a@armoraareoo> image stuff... ------=_NextPart_DAF_AA5B_FF2BEB78.AF9733DB--
now how can i ignore the image part, find the nested subparts? I have tried with the flag:  $parser->parse_nested_messages(1); but this does't seem to do anything when i issue  $entity->dump_skeleton; to check the layout of the parts. Here is my code to get the entity:
#!/usr/bin/perl use Email::AddressParser; use Data::Dumper; use MIME::Parser; use strict; use warnings; undef $/; my $message = <>; my $parser = MIME::Parser->new; $parser->tmp_to_core(1); $parser->parse_nested_messages(1); my $entity = $parser->parse_data($message); $entity->dump_skeleton; my $head = $entity->head; my $subject = $head->get('Subject',0); if($subject =~ /\n/) { chop($subject); } my $to = $head->get('To', 0); if($to =~ /\n/) { chop($to); } my @addresses = Email::AddressParser->parse($to); $to = $addresses[0]->address if(@addresses); my $num_parts = $entity->parts; print "$subject\t$to\t$num_parts\n"; $entity->purge;

In reply to parsing mime emails (revised!) by downer

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.