Dear Monks,

I'm using Mail::Pop3Client to retrieve messages from an exchange account with an ssl connection. I've run into a few problems when getting the email body using Body():

1. html emails contain more than just html tags: they include some header info and miscellaneous information that varies depending on which carrier the email is from, text is even duplicated and special characters are converted differently. I've tried using regular expressions to clean it up, but it's getting hairy to account for all the discrepancies, and I want to make sure this will work from ANY carrier. I can make all this work, however...

2. Emails originated from Cox (there may be more providers, this is just the one that i've discovered so far) get translated to gibberish. eg: original plain text mail says 'This is a test...', but Body() returns 'VGhpcyBpcyBhIHRlc3QuLi4NCg==' -- that's it -- and the html email returns a basic header followed by lines and lines of seemingly random characters. Viewing the email through outlook looks normal.

The ultimate goal is to take the body of the message in plain simple text and send a text message to a cell phone. Any extra/confusing characters are unacceptable. Is there a better module out there that I should be using instead or is there a way to make this work?

for (my $i=1; $i<=$messages; $i++){ foreach( $pop->Head($i)){ if($_=~/From:[^<>]*\<(.*)\>/){ print "From: $1 "; $emails[$i]->{'from'}=$1; } if($_=~/Subject:(.*)/){ #print "To: $1 "; $emails[$i]->{'subject'}=$1; } } my $body=$pop->Body($i); $body=~s/\n/ /g; $body=~s/\r/ /g; $body=~s/^.*\<body[^<>]*\>(.*)/$1/; $body=~s/(.*)\<\/body[^<>]*\>.*$/$1/; while($body=~/[<>]/){ $body=~s/\<[^<>]*\>(.*)/$1/; } $body=~s/&#8217;/\'/g; $body=~s/&#39;/\'/g; $body=~s/\=92/\'/g; $body=~s/\=A0/ /g; $body=~s/\=\s //g; $body=~s/\&nbsp;/ /g; $body=~s/\s{2,}/ /g; $emails[$i]->{'body'}=$body; #$pop->Delete($i); print "Message: ".$body."\n\n"; } $pop->Close();

In reply to Mail::Pop3Client - want to get consistent body text by ksublondie

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.