rmflow has asked for the wisdom of the Perl Monks concerning the following question:

I have the following example:
#!/usr/bin/perl binmode STDOUT, ':utf8'; print "Çirçös\n";
When I run this script I see the garbage, but if I remove binmode line:
#!/usr/bin/perl print "Çirçös\n";
then 'Çirçös' is printed correctly. Since my console is already in utf-8, I expected correct output both times. What could be wrong here?

Replies are listed 'Best First'.
Re: clarification on binmode STDOUT
by moritz (Cardinal) on Jul 01, 2009 at 14:15 UTC
    The problem is that you add an encoding step without a decoding step. When you don't add the utf8 layer, then the literal string Çirçös\n" is interpreted as a series of bytes, and written to STDOUT as such. No problem.

    When you add the output layer, you tell print to convert from characters to bytes. So the Ç is interpreted as a series of characters, and defaults to Latin-1 encoding. Its UTF-8 bytes are 0xc3 0x87, and that is interpreted as U+00C3 LATIN CAPITAL LETTER A WITH TILDE, U+0087 <control>, so what you'll see for the first character is Ã&#135;

    This can be solved by also adding the line use utf8; to your program, telling it that string literals should be decoded as utf-8.

    I tried to describe Perl's Unicode model in this article, I hope it will help you understanding what's going on.

      Thanks you, it was very helpful.
Re: clarification on binmode STDOUT
by ikegami (Patriarch) on Jul 01, 2009 at 17:51 UTC

    The source file is treated as iso-latin-1 since you didn't tell Perl otherwise. So you end up seeing the result of

    decode 'UTF-8', # Your terminal encode 'UTF-8', # binmode decode 'iso-latin-1', # Perl reading the source. \ MISMATCH encode 'UTF-8', # Your editor / "Çirçös"

    By telling Perl the source is UTF-8,

    decode 'UTF-8', # Your terminal encode 'UTF-8', # binmode decode 'UTF-8', # Perl reading the source encode 'UTF-8', # Your editor "Çirçös"

    You want:

    #!/usr/bin/perl use utf8; use open ':std', ':locale'; print "Çirçös\n";

    See utf8, open

Re: clarification on binmode STDOUT
by Anonymous Monk on Jul 01, 2009 at 14:12 UTC
    :utf8 flips a flag, you probably wanted
    binmode STDOUT, ':encoding(UTF-8)';
    or
    use open ':locale'; # or use open ':encoding(UTF-8)';
      You'll also need :std
      use open qw' :std :encoding(UTF-8) ';
Re: clarification on binmode STDOUT
by Anonymous Monk on Jul 01, 2009 at 14:11 UTC
    perl is parsing wrong the file.
    use utf8;