in reply to Extracting TEXT from email

It would help if you explained what the input was. How is the text that you are trying to get encoded into the email? What are you qualifying as text? (ie, based on the input, what are you trying to get as the output?)

There are many, many ways to encode text into an e-mail (MIME, PGP, PGP+MIME, UUEncode, BinHex, BinHex+MIME, Quoted Printable, etc.) Without knowing what you're dealing with, we can only guess at what it is that you're asking for.

Replies are listed 'Best First'.
Re^2: Extracting TEXT from email
by ady (Deacon) on Apr 30, 2005 at 15:36 UTC
    I'm at the "receiving end" of the mail wire: i receice mails in my (MS Windows Exchange) inbox encoded in standard mail/MIME format.

    I'm interested in the text part of the body of these mails, that is: "what follows the mail header" (ie. the From:, Sent: To: Issue: stuff). The body contains haiku entries, that i parse and reshuffle into a voting list, and subsequently rank according to received votes, -- but the app as such is not that interesting in this context.
    An example of the text part of a mail message is:

    [author] xxx yyy [1] clouds . . . the distance blossoming between two crows [2] a morning without incident dead fly [3] sunrise ceremony the holy man's third eye bloodshot [4] morning dew bell bottoms darkened by mayflies

    This is what i'm interested in parsing out, and this is the text part of the message, that is displayed in the mail client (in casu: MS Outlook).

    The problem is, that the above text is not what i get from the mail body handed over by the the mentioned MIME modules. Instead i get the full mail body segment, including binary MIME encodings and HTML tagging.

    So i have to do some filtering to get at the text "payload", that i need for the app. Now i was wondering, if anybody had already wrapped this functionality into a function, possibly in a MIME module. That was my question

    I haven't worked with email before, so maybe i'm simply overlooking som basic assumptions about the MIME format & parsing...
    -- allan

      I find it's good to understand what you're working with. (this being said as I deal with data at work that I have absolutely no idea what it actually means)

      Basically, email is sent using SMTP as what it calls a mail object, which is composed of headers, an empty line, and a message body.

      Bodys are required to be ASCII, which limits you to 7bits, but someone thought it would be a good idea to send non-text files, so came up with MIME. Using MIME, the body may be identified as being one or more encapsulted objects. To mark the body as being MIME encoded, there are additional headers inserted into the heading of the email message.

      There's a fair bit of background information in the MIME::Tools documentation.

      what about a glance at perlretut?
      language is a virus from outer space.