downer has asked for the wisdom of the Perl Monks concerning the following question:

I have been able to transform a mime message into a mime entity and get what i want from the header. The problem is, what do i now do with the one or more parts which result? I am trying to get just the raw terms from an email. How do i take a set or mime parts and combine just their contents from the entity structure? here is a sample email file which i am inputting via stdin:
From armoraareoo@t-dialin.net Sun Apr 8 16:11:45 2007 Return-Path: <armoraareoo@t-dialin.net> Received: from plg2.math.uwaterloo.ca (plg2.math.uwaterloo.ca [129.97. +186.80]) by speedy.uwaterloo.ca (8.12.8/8.12.5) with ESMTP id l38KBj0I00482 +7 for <theplg@speedy.uwaterloo.ca>; Sun, 8 Apr 2007 16:11:45 -0400 Received: from t-dialin.net (p508ee6ed.dip.t-dialin.net [80.142.230.23 +7]) by plg2.math.uwaterloo.ca (8.13.8/8.13.8) with SMTP id l38KAt7e009 +862; Sun, 8 Apr 2007 16:11:01 -0400 (EDT) Message-ID: <2fee01c779eb$fb400220$c15f4e5d@armoraareoo> From: "Drew" <armoraareoo@t-dialin.net> To: "Lynsey Harvey" <dmason@plg2.math.uwaterloo.ca> Cc: "Dorcas" <migod@plg2.math.uwaterloo.ca>, "Misty" <holt@plg2.math.uwaterloo.ca>, "Rosalia" <dsvetinovic@plg2.math.uwaterloo.ca>, "Bart Shaw" <y5guo@plg2.math.uwaterloo.ca>, "Alexia Myers" <the00@plg2.math.uwaterloo.ca>, "Lona Gomez" <adtrevors@plg2.math.uwaterloo.ca>, "Caridad Sims" <elterra@plg2.math.uwaterloo.ca> Subject: How r u lately Date: Sun, 08 Apr 2007 14:41:24 -0500 MIME-Version: 1.0 Content-Type: multipart/related; type="multipart/alternative"; boundary="----=_NextPart_DAF_AA5B_FF2BEB78.AF9733DB" X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2462.0000 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2462.0000 X-Miltered: at mailchk-m02 with ID 46194C50.000 by Joe's j-chkmail (ht +tp://j-chkmail.ensmp.fr)! X-Virus-Scanned: ClamAV version 0.90.1, clamav-milter version 0.90.1 o +n localhost X-Virus-Status: Clean X-UUID: 3e328b2a-cdb4-49f8-94ce-feeb89b85d5d Status: O Content-Length: 21559 Lines: 322 This is a multi-part message in MIME format. ------=_NextPart_DAF_AA5B_FF2BEB78.AF9733DB Content-Type: multipart/alternative; boundary="----=_NextPart_CA0_4C28_95CE35A4.E636E095" ------=_NextPart_CA0_4C28_95CE35A4.E636E095 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable part one of the document ------=_NextPart_CA0_4C28_95CE35A4.E636E095 Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable same document... ------=_NextPart_CA0_4C28_95CE35A4.E636E095-- ------=_NextPart_DAF_AA5B_FF2BEB78.AF9733DB Content-Type: image/gif; name="sumorg.gif" Content-Transfer-Encoding: base64 Content-ID: <5627001c779eb7fbaa0e902503734a@armoraareoo> image stuff... ------=_NextPart_DAF_AA5B_FF2BEB78.AF9733DB--
now how can i ignore the image part, find the nested subparts? I have tried with the flag:  $parser->parse_nested_messages(1); but this does't seem to do anything when i issue  $entity->dump_skeleton; to check the layout of the parts. Here is my code to get the entity:
#!/usr/bin/perl use Email::AddressParser; use Data::Dumper; use MIME::Parser; use strict; use warnings; undef $/; my $message = <>; my $parser = MIME::Parser->new; $parser->tmp_to_core(1); $parser->parse_nested_messages(1); my $entity = $parser->parse_data($message); $entity->dump_skeleton; my $head = $entity->head; my $subject = $head->get('Subject',0); if($subject =~ /\n/) { chop($subject); } my $to = $head->get('To', 0); if($to =~ /\n/) { chop($to); } my @addresses = Email::AddressParser->parse($to); $to = $addresses[0]->address if(@addresses); my $num_parts = $entity->parts; print "$subject\t$to\t$num_parts\n"; $entity->purge;

Replies are listed 'Best First'.
Re: parsing mime emails
by moritz (Cardinal) on Mar 25, 2009 at 23:19 UTC
    MIME::Parser: can't flush: No space left on device

    So it tries to write on a partition that's full. On a unixish system you can find out which one is full by running df.

Re: parsing mime emails (revised!)
by ig (Vicar) on Mar 28, 2009 at 18:35 UTC
    now how can i ignore the image part, find the nested subparts?

    The following will deal with all the parts of an entity, skipping any parts that are mime type 'image/gif':

    foreach my $part ($entity->parts_DFS) { my $type = $part->mime_type; next if($type eq 'image/gif'); print "Here's a part: $type\n"; # do what you want with the part here }