in reply to Re: Decoding an email body, into utf8
in thread Decoding an email body, into utf8

Hi,

Thanks - that kinda works :)

The output I'm getting now in SSH, is:

FOO: testing a reply *Do it as *html     I guess Andy Newby  *Email:*     andy@xx.co.uk   * WWW:      * http://www.xx.co.    uk   Mobile: *      07769 201 576    Thanks

Andy

Replies are listed 'Best First'.
Re^3: Decoding an email body, into utf8
by hippo (Archbishop) on Jul 22, 2016 at 13:45 UTC

    And what output do you expect? It would help to see this in the form of a test eg. How to ask better questions using Test::More and sample data

    Note that you will need a utf-8 capable terminal and have the correct locale set in order to view utf-8 data (which the output appears like it might be).

    (updated: added link to test example)

      Ah, that was it. Bit surprised, as that section shouldn't really have any utf8 in it (just plain text)

      Anyway, all I need to do now is look out for what the email encoding is, and if utf8 convert it.

      Thanks!

      Andy

        Words like résumé and piñon are plain text and many punctuation marks are too (dashes, ellipsis, real quotes and apostrophes, etc). UTF-8 is the best (default) encoding for plain text.

      Mmmm actually, that doesn't work in some cases:

      my $name = "From: =?UTF-8?B?QW5keSBOZXdieSDDrcOpw7M=?= <andy\@cham +bresdhotesfrance.com>"; use MIME::QuotedPrint; if ($name =~ /utf-8|utf8/i) { $name= decode_qp($name); $name =~ s/([\200-\377]+)/from_utf8({ -string => $1, -charset +=> 'ISO-8859-1'})/eg; } print $IN->header; print "NOW: $name";

      Prints out:

      NOW: From: =?UTF-8?B?QW5keSBOZXdieSDDrcOpw7M=?= 
      ...instead of what I was expecting:

      Andy Newby íóé

      This is a valid email header passed through from Thunderbird. Any ideas why it won't decode?

      Cheers

      Andy

        Interesting. Just found a post on StackOverflow, where someone suggested using decode_base64 to decode it. And that seems to work:

        my $name = "=?UTF-8?B?QW5keSBOZXdieSDDrcOpw7M=?="; use MIME::Base64; $name =~ s|\Q=?UTF-8?B?||i; $name = decode_base64($name); $name =~ s/([\200-\377]+)/from_utf8({ -string => $1, -charset => ' +ISO-8859-1'})/eg;

        I wonder why the other module doesn't work?

        Cheers

        Andy