in reply to Re^2: Wide characters in e-mail
in thread Wide characters in e-mail

...but I am confused as to why

Thing is that the socket which Mail::Sender is printing to, is not set up to handle Perl unicode (UTF-8) strings (as you get back from decode_entities). Whenever you print a unicode string (i.e. one that is Perl-internally flagged as unicode with the "utf8" flag on) to a filehandle/socket which is not opened for UTF-8, you'll get the "Wide character in print" warning, if the string does contain 'wide' characters (i.e. codepoint > 255).

Encode::encode('utf8', ...) essentially removes that utf8 flag, i.e. it encodes the string from the Perl-internal unicode representation into a byte string, which in this case holds the data in its proper UTF-8 encoding, but without the utf8 flag set. That's why you're no longer getting the warning from Mail::Sender — because in a byte string, no value is > 255.   (As already implied, Mail::Sender hasn't been written to accept unicode strings, even if you declare charset to be 'utf8'.)

If you want to explore this further, you could look into Mail::Sender's Connect routine (line 934), where the socket is being opened. If you'd add (for testing purposes)

binmode($s, ":utf8");

before the return $s; ($s is the socket), my prediction would be that you'd no longer need to Encode::encode your $input. Just in case you feel like playing around... :)

Replies are listed 'Best First'.
Re^4: Wide characters in e-mail
by Jenda (Abbot) on May 03, 2008 at 23:54 UTC

    The problem with this solution is that it breaks emails that are not UTF. If I send a message with some non ASCII Latin1 (well, windows1252) characters, with this binmode() I receive them converted to UTF-8.

    So I guess, I should binmode($s, ":utf8"); only for the UTF-8 body of the message or the UTF-8 message part. And turn it back to binary($s); afterwards. Though I'm afraid of what it would do if someone did the encode('utf8', ...) on the text before turning it to Mail::Sender :-(

    So I'm afraid of making that change.

      If I send a message with some non ASCII Latin1 (well, windows1252) characters,

      You can't do that given $mail{'Content-type'} = 'text/plain; charset="utf-8"';. It would be like Verizon quoting a price of 0.002 *cents* per kilobyte but charging you 0.002 *dollars* per kilobyte. (story) You can't tell the client you're using one encoding and but actually use another.

      If you use $mail{'Content-type'} = 'text/plain; charset="UTF-8"';, then you'd use binmode($s, ":encoding(UTF-8)"); or encode("UTF-8", $text).

      If you use $mail{'Content-type'} = 'text/plain; charset="cp1252"';, then you'd use binmode($s, ":encoding(cp1252)"); or encode("cp1252", $text).

      Though I'm afraid of what it would do if someone did the encode('utf8', ...) on the text before turning it to Mail::Sender :-(

      It would produce junk. You can't use both encode($encoding, ) and binmode(, ":encoding($encoding)").