I think the heuristic in common use is similar to this:
-
If there are two line endings in a row, it's a new paragraph and they should be kept.
-
If there's a quoting character at the beginning of a line, the line ending for the previous line should be kept. Quote characters include:
... and any number of nested quote characters after the first without a line ending in between stay part of the same line.
-
Otherwise, strip the line ending for side-scrolling or soft wrapping (or reformat the lines with a new hard wrap to the maximum length the client sets).
You'd have to do that with the data after you've already retrieved it from the POP3 server. The protocol itself is just a text dump and is not concerned with formatting in any way other than standardized line endings and a blank line between headers and the body. | [reply] |
Thanks for the reply. I'll try that tomorrow, though I think I'm still going to struggle given the mails I need to parse are of the form:
Longline1
Longline2
etc
Rather than:
Longline1
Longline2
etc
I'll see how it goes though. | [reply] [d/l] [select] |
I believe RFC2646 specifies the line wrapping mechanism that is vexing you. Its intent is to describe
[...]a format which is in all significant ways Text/Plain, and therefore is quite suitable for display as Text/Plain, and yet allows the sender to express to the receiver which lines can be considered a logical paragraph, and thus flowed (wrapped and joined) as appropriate.
I believe these messages should include "Format=Flowed" in the "Content-Type:" header line. The clues to reforming paragraphs are mostly hidden in the whitespace.
On line lengths, RFC5322 specifies (section 2.1.1):
Each line of characters MUST be no more than 998 characters, and SHOULD be no more than 78 characters, excluding the CRLF.
This is not a POP3 issue. | [reply] |
I've never noticed a 'Format=Flowed' in the header (and I've hit this problem with plain, rich amd HTML text), but I'll definitely have a closer look at what each whitespace and control character in the line actually is, rather than taking it at face value.
Cheers for the pointer.
| [reply] |