comment on

I have been able to transform a mime message into a mime entity and get what i want from the header. The problem is, what do i now do with the one or more parts which result? I am trying to get just the raw terms from an email. How do i take a set or mime parts and combine just their contents from the entity structure? here is a sample email file which i am inputting via stdin:

From armoraareoo@t-dialin.net  Sun Apr  8 16:11:45 2007
Return-Path: <armoraareoo@t-dialin.net>
Received: from plg2.math.uwaterloo.ca (plg2.math.uwaterloo.ca [129.97.
+186.80])
    by speedy.uwaterloo.ca (8.12.8/8.12.5) with ESMTP id l38KBj0I00482
+7
    for <theplg@speedy.uwaterloo.ca>; Sun, 8 Apr 2007 16:11:45 -0400
Received: from t-dialin.net (p508ee6ed.dip.t-dialin.net [80.142.230.23
+7])
    by plg2.math.uwaterloo.ca (8.13.8/8.13.8) with SMTP id l38KAt7e009
+862;
    Sun, 8 Apr 2007 16:11:01 -0400 (EDT)
Message-ID: <2fee01c779eb$fb400220$c15f4e5d@armoraareoo>
From: "Drew" <armoraareoo@t-dialin.net>
To: "Lynsey Harvey" <dmason@plg2.math.uwaterloo.ca>
Cc: "Dorcas" <migod@plg2.math.uwaterloo.ca>,
   "Misty" <holt@plg2.math.uwaterloo.ca>,
   "Rosalia" <dsvetinovic@plg2.math.uwaterloo.ca>,
   "Bart Shaw" <y5guo@plg2.math.uwaterloo.ca>,
   "Alexia Myers" <the00@plg2.math.uwaterloo.ca>,
   "Lona Gomez" <adtrevors@plg2.math.uwaterloo.ca>,
   "Caridad Sims" <elterra@plg2.math.uwaterloo.ca>
Subject: How r u lately
Date: Sun, 08 Apr 2007 14:41:24 -0500
MIME-Version: 1.0
Content-Type: multipart/related;
    type="multipart/alternative";
    boundary="----=_NextPart_DAF_AA5B_FF2BEB78.AF9733DB"
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2462.0000
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2462.0000
X-Miltered: at mailchk-m02 with ID 46194C50.000 by Joe's j-chkmail (ht
+tp://j-chkmail.ensmp.fr)!
X-Virus-Scanned: ClamAV version 0.90.1, clamav-milter version 0.90.1 o
+n localhost
X-Virus-Status: Clean
X-UUID: 3e328b2a-cdb4-49f8-94ce-feeb89b85d5d
Status: O
Content-Length: 21559
Lines: 322

This is a multi-part message in MIME format.

------=_NextPart_DAF_AA5B_FF2BEB78.AF9733DB
Content-Type: multipart/alternative;
    boundary="----=_NextPart_CA0_4C28_95CE35A4.E636E095"

------=_NextPart_CA0_4C28_95CE35A4.E636E095
Content-Type: text/plain;
    charset="us-ascii"
Content-Transfer-Encoding: quoted-printable




part one of the document

------=_NextPart_CA0_4C28_95CE35A4.E636E095
Content-Type: text/html;
    charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

same document... 


------=_NextPart_CA0_4C28_95CE35A4.E636E095--

------=_NextPart_DAF_AA5B_FF2BEB78.AF9733DB
Content-Type: image/gif;
    name="sumorg.gif"
Content-Transfer-Encoding: base64
Content-ID: <5627001c779eb7fbaa0e902503734a@armoraareoo>
image stuff... 
------=_NextPart_DAF_AA5B_FF2BEB78.AF9733DB--
[download]

now how can i ignore the image part, find the nested subparts? I have tried with the flag: $parser->parse_nested_messages(1); but this does't seem to do anything when i issue $entity->dump_skeleton; to check the layout of the parts. Here is my code to get the entity:

#!/usr/bin/perl
use Email::AddressParser;
use Data::Dumper;
use MIME::Parser;
use strict;
use warnings;

undef $/;
my $message = <>;

my $parser = MIME::Parser->new;
$parser->tmp_to_core(1);
$parser->parse_nested_messages(1);
my $entity = $parser->parse_data($message);

$entity->dump_skeleton;

my $head = $entity->head;

my $subject = $head->get('Subject',0);

if($subject =~ /\n/)
  {
    chop($subject);
  }
my $to = $head->get('To', 0);
if($to =~ /\n/)
  {
    chop($to);
  }
my @addresses = Email::AddressParser->parse($to);
$to = $addresses[0]->address if(@addresses);
my $num_parts = $entity->parts;
print "$subject\t$to\t$num_parts\n";

$entity->purge;
[download]

In reply to parsing mime emails (revised!) by downer

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.