in reply to Re^2: Decoding an email body, into utf8
in thread Decoding an email body, into utf8

And what output do you expect? It would help to see this in the form of a test eg. How to ask better questions using Test::More and sample data

Note that you will need a utf-8 capable terminal and have the correct locale set in order to view utf-8 data (which the output appears like it might be).

(updated: added link to test example)

  • Comment on Re^3: Decoding an email body, into utf8

Replies are listed 'Best First'.
Re^4: Decoding an email body, into utf8
by ultranerds (Hermit) on Jul 22, 2016 at 13:50 UTC
    Ah, that was it. Bit surprised, as that section shouldn't really have any utf8 in it (just plain text)

    Anyway, all I need to do now is look out for what the email encoding is, and if utf8 convert it.

    Thanks!

    Andy

      Words like résumé and piñon are plain text and many punctuation marks are too (dashes, ellipsis, real quotes and apostrophes, etc). UTF-8 is the best (default) encoding for plain text.

Re^4: Decoding an email body, into utf8
by ultranerds (Hermit) on Jul 27, 2016 at 05:34 UTC

    Mmmm actually, that doesn't work in some cases:

    my $name = "From: =?UTF-8?B?QW5keSBOZXdieSDDrcOpw7M=?= <andy\@cham +bresdhotesfrance.com>"; use MIME::QuotedPrint; if ($name =~ /utf-8|utf8/i) { $name= decode_qp($name); $name =~ s/([\200-\377]+)/from_utf8({ -string => $1, -charset +=> 'ISO-8859-1'})/eg; } print $IN->header; print "NOW: $name";

    Prints out:

    NOW: From: =?UTF-8?B?QW5keSBOZXdieSDDrcOpw7M=?= 
    ...instead of what I was expecting:

    Andy Newby íóé

    This is a valid email header passed through from Thunderbird. Any ideas why it won't decode?

    Cheers

    Andy

      Interesting. Just found a post on StackOverflow, where someone suggested using decode_base64 to decode it. And that seems to work:

      my $name = "=?UTF-8?B?QW5keSBOZXdieSDDrcOpw7M=?="; use MIME::Base64; $name =~ s|\Q=?UTF-8?B?||i; $name = decode_base64($name); $name =~ s/([\200-\377]+)/from_utf8({ -string => $1, -charset => ' +ISO-8859-1'})/eg;

      I wonder why the other module doesn't work?

      Cheers

      Andy
        RFC 2047, section 4 says it. The "?B?" part tells you to use base 64, while "?q?" means "quoted-printable".

        Since you already know CPAN: search term is rfc 2047