Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

Older questions and answers on the subject seem to point to Mail::POP3Client as being a nice module to use when reading email, and I agree- it is. However, I'm struggling with wrap on long lines.

Basically: most mail servers/clients seem to hard wrap on lines longer than 76 characters when sending emails. Outlook (and similar) know how to reconstruct these into single lines at the other end, but I can't for the life of me figure it out in Mail::POP3Client. All I get are individual lines, with no indication that it was supposed to be one.

Presumably, this is a limitation of the POP3 protocol itself rather than the Mail::POP3Client Perl module, but it's really doing my head in, so does anyone know of any workarounds? Any help appreciated.

Cheers,
Marc

Replies are listed 'Best First'.
Re: POP3 and long lines (Mail::POP3Client)
by mr_mischief (Monsignor) on Nov 19, 2008 at 17:43 UTC
    I think the heuristic in common use is similar to this:

    • If there are two line endings in a row, it's a new paragraph and they should be kept.
    • If there's a quoting character at the beginning of a line, the line ending for the previous line should be kept. Quote characters include:
      • >
      • ::
      • name:
      • name>
      ... and any number of nested quote characters after the first without a line ending in between stay part of the same line.
    • Otherwise, strip the line ending for side-scrolling or soft wrapping (or reformat the lines with a new hard wrap to the maximum length the client sets).

    You'd have to do that with the data after you've already retrieved it from the POP3 server. The protocol itself is just a text dump and is not concerned with formatting in any way other than standardized line endings and a blank line between headers and the body.

      Thanks for the reply. I'll try that tomorrow, though I think I'm still going to struggle given the mails I need to parse are of the form:
      Longline1 Longline2 etc
      Rather than:
      Longline1 Longline2 etc
      I'll see how it goes though.
Re: POP3 and long lines (Mail::POP3Client)
by eye (Chaplain) on Nov 19, 2008 at 20:17 UTC
    I believe RFC2646 specifies the line wrapping mechanism that is vexing you. Its intent is to describe
    [...]a format which is in all significant ways Text/Plain, and therefore is quite suitable for display as Text/Plain, and yet allows the sender to express to the receiver which lines can be considered a logical paragraph, and thus flowed (wrapped and joined) as appropriate.

    I believe these messages should include "Format=Flowed" in the "Content-Type:" header line. The clues to reforming paragraphs are mostly hidden in the whitespace.

    On line lengths, RFC5322 specifies (section 2.1.1):

    Each line of characters MUST be no more than 998 characters, and SHOULD be no more than 78 characters, excluding the CRLF.

    This is not a POP3 issue.

      I've never noticed a 'Format=Flowed' in the header (and I've hit this problem with plain, rich amd HTML text), but I'll definitely have a closer look at what each whitespace and control character in the line actually is, rather than taking it at face value.

      Cheers for the pointer.