if ($isRedirected) { if (!defined binmode STDOUT, ':encoding(UTF-8)') { warn "binmode failed: $!\n"; } utf8::encode($Str); # <--- BUG HERE }
The function utf8::encode changes the bytes/characters that the string contains. So if you start out with 8 bytes in the string which you happen to know are a ISO8859-1 representation of your characters, then calling utf8::encode is going to result in 15 bytes which will be your characters represented in UTF-8, and then when you print on STDOUT those bytes get encoded as if they were each a unicode character, resulting in about 30-40 bytes.
In other words, if you put the utf8 layer on a file handle, don't also encode things prior to printing them.
If that doesn't make sense, you need to first understand that Perl doesn't *track* character encodings of its strings, or even whether they are meant to be bytes or characters. It requires *you* to keep track of that and ask for the appropriate conversions at the appropriate points. The best advice for dealing with that requirement is to try to make sure you always have unicode in your strings and only encode or decode at the edges of your program as data comes in or goes out (such as with filehandle layers) ...and make sure your string constants are in a source file with "use utf8" written by a text editor that writes utf8.
Edit: to clarify, if you happen to have some characters in a string which came from a Windows codepage that doesn't match unicode's definition for 0x80-0xFF, then you first need to decode that, (to get unicode) before re-encoding as utf8. The ":encoding(UTF-8)" layer makes the assumption that you start from unicode codepoints, and will generate garbage if you started from something different.
In reply to Re: binmode(':encoding(UTF-8)') did not produce utf8 for me
by NERDVANA
in thread binmode(':encoding(UTF-8)') did not produce utf8 for me
by hexcoder
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |