in reply to clarification on binmode STDOUT

The problem is that you add an encoding step without a decoding step. When you don't add the utf8 layer, then the literal string Çirçös\n" is interpreted as a series of bytes, and written to STDOUT as such. No problem.

When you add the output layer, you tell print to convert from characters to bytes. So the Ç is interpreted as a series of characters, and defaults to Latin-1 encoding. Its UTF-8 bytes are 0xc3 0x87, and that is interpreted as U+00C3 LATIN CAPITAL LETTER A WITH TILDE, U+0087 <control>, so what you'll see for the first character is Ã&#135;

This can be solved by also adding the line use utf8; to your program, telling it that string literals should be decoded as utf-8.

I tried to describe Perl's Unicode model in this article, I hope it will help you understanding what's going on.

Replies are listed 'Best First'.
Re^2: clarification on binmode STDOUT
by rmflow (Beadle) on Jul 01, 2009 at 14:53 UTC
    Thanks you, it was very helpful.