Hi fellow monks!

I'm currently working on a script that would retrieve in an email via POP3, perform some text replacements on the email body, and resend it out again.

Right now it performs the text replacements via regex on the raw message content, and it works ok for some instances where emails are encoded with the Quoted-Print, when long lines will be split across multiple lines, kind of like this:

Thank you Joanne for your informative email and for organising the = mailing list - let's all commit to...

Since words can now be split across multiple lines, my regexes will now fail for those words that happen to fall on a line boundary.

I'm aware of the MIME::QuotedPrint module, so I tried stitching the email lines together (separated by "\n"s, and passing it through MIME::QuotedPrint::decode, but the real trouble really comes when some lines are encoded like this:

centre)<BR></FONT>&nbsp;<BR><FONT color=3D#c00000><STRONG>31/5/09=20 </STRONG></FONT><FONT color=3D#3f3f3f>Medical work (including=20

I have not had a chance to experiment with emails that contain MIME::Encoded file attachments yet, but I suspect passing them through MIME::QuotedPrint::decode will not bode very well too.

What's the best way I can go about solving this problem?


In reply to Email parsing CPAN module? by woei

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.