Jobby has asked for the wisdom of the Perl Monks concerning the following question:

Hello, all. I have a small mailing list whose entries I wish to archive on a website as soon as they come in. Here's the code I'm using at the moment:

my $entry; if ($mail->header("Content-Type") =~ /plain/) { logtext("Plain text. Marking up."); my $conv = new HTML::TextToHTML(); $entry = $conv->process_para($mail->body); } else { logtext("Not plain text. Scrubbing."); my $scrubber = HTML::Scrubber->new( allow => [ qw[ p b i u hr +br ] ] ); $entry = $scrubber->scrub($mail->body); }

This solution works pretty well for most mails (list members either send from Hotmail accounts or use plain text), but for HTML mails sent from Outlook Express it fails completely. I end up with '=3D' and '=20' at the end of every line. Should I write code to deal with this as a special case or has someone already solved this problem?

Replies are listed 'Best First'.
Re: Sanitising email for posting on website
by Jenda (Abbot) on Oct 03, 2003 at 14:38 UTC

    You forgot to decode the message. It was sent with the quoted-printable encoding.

    Add something like:

    use MIME::Base64; use MIME::QuotedPrint; ... my $body; if ($mail->header('Content-transfer-encoding') =~ /quoted-print/i) { $body = decode_qp($mail->body); } elsif ($mail->header('Content-transfer-encoding') =~ /base64/i) { $body = decode_base64($mail->body); } else { $body = $mail->body; }
    and then use the $body instead of $mail->body.

    HTH, Jenda
    Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.
       -- Rick Osborne

    Edit by castaway: Closed small tag in signature