in reply to Re: Extracting TEXT from email
in thread Extracting TEXT from email

I'm at the "receiving end" of the mail wire: i receice mails in my (MS Windows Exchange) inbox encoded in standard mail/MIME format.

I'm interested in the text part of the body of these mails, that is: "what follows the mail header" (ie. the From:, Sent: To: Issue: stuff). The body contains haiku entries, that i parse and reshuffle into a voting list, and subsequently rank according to received votes, -- but the app as such is not that interesting in this context.
An example of the text part of a mail message is:

[author] xxx yyy [1] clouds . . . the distance blossoming between two crows [2] a morning without incident dead fly [3] sunrise ceremony the holy man's third eye bloodshot [4] morning dew bell bottoms darkened by mayflies

This is what i'm interested in parsing out, and this is the text part of the message, that is displayed in the mail client (in casu: MS Outlook).

The problem is, that the above text is not what i get from the mail body handed over by the the mentioned MIME modules. Instead i get the full mail body segment, including binary MIME encodings and HTML tagging.

So i have to do some filtering to get at the text "payload", that i need for the app. Now i was wondering, if anybody had already wrapped this functionality into a function, possibly in a MIME module. That was my question

I haven't worked with email before, so maybe i'm simply overlooking som basic assumptions about the MIME format & parsing...
-- allan

Replies are listed 'Best First'.
Re^3: Extracting TEXT from email
by jhourcle (Prior) on Apr 30, 2005 at 19:23 UTC

    I find it's good to understand what you're working with. (this being said as I deal with data at work that I have absolutely no idea what it actually means)

    Basically, email is sent using SMTP as what it calls a mail object, which is composed of headers, an empty line, and a message body.

    Bodys are required to be ASCII, which limits you to 7bits, but someone thought it would be a good idea to send non-text files, so came up with MIME. Using MIME, the body may be identified as being one or more encapsulted objects. To mark the body as being MIME encoded, there are additional headers inserted into the heading of the email message.

    There's a fair bit of background information in the MIME::Tools documentation.

Re^3: Extracting TEXT from email
by thcsoft (Monk) on May 01, 2005 at 11:11 UTC
    what about a glance at perlretut?
    language is a virus from outer space.