comment on

Ikegami has a nice explanation, but I want to add that

if ($isRedirected) {
    if (!defined binmode STDOUT, ':encoding(UTF-8)') {
        warn "binmode failed: $!\n";
    }
    utf8::encode($Str);    # <--- BUG HERE
}
[download]

The function utf8::encode changes the bytes/characters that the string contains. So if you start out with 8 bytes in the string which you happen to know are a ISO8859-1 representation of your characters, then calling utf8::encode is going to result in 15 bytes which will be your characters represented in UTF-8, and then when you print on STDOUT those bytes get encoded as if they were each a unicode character, resulting in about 30-40 bytes.

In other words, if you put the utf8 layer on a file handle, don't also encode things prior to printing them.

If that doesn't make sense, you need to first understand that Perl doesn't *track* character encodings of its strings, or even whether they are meant to be bytes or characters. It requires *you* to keep track of that and ask for the appropriate conversions at the appropriate points. The best advice for dealing with that requirement is to try to make sure you always have unicode in your strings and only encode or decode at the edges of your program as data comes in or goes out (such as with filehandle layers) ...and make sure your string constants are in a source file with "use utf8" written by a text editor that writes utf8.

Edit: to clarify, if you happen to have some characters in a string which came from a Windows codepage that doesn't match unicode's definition for 0x80-0xFF, then you first need to decode that, (to get unicode) before re-encoding as utf8. The ":encoding(UTF-8)" layer makes the assumption that you start from unicode codepoints, and will generate garbage if you started from something different.

In reply to Re: binmode(':encoding(UTF-8)') did not produce utf8 for me by NERDVANA
in thread binmode(':encoding(UTF-8)') did not produce utf8 for me by hexcoder

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.