in reply to Re: Unicode problem with some letters
in thread Unicode problem with some letters

Ok, thanks.
But can you tell me why output without setting output layer to utf8 looks like "�"? Perl eats my data?
  • Comment on Re^2: Unicode problem with some letters

Replies are listed 'Best First'.
Re^3: Unicode problem with some letters
by moritz (Cardinal) on Aug 21, 2011 at 19:54 UTC

    When you don't specify :utf8 or :encoding(UTF-8), Perl assumes Latin-1 (aka ISO-8859-1):

    $ echo -e "\xC3\xA0" | perl -pne 'BEGIN{binmode STDIN, ":utf8"}'|hexdu +mp -C e0

    Latin-1 0xE0 encodes the codepoint U+00E0 LATIN SMALL LETTER A WITH GRAVE, which is the character that the UTF-8 string C3 A0 encodes.

    Since your terminal is configured to receive UTF-8 output (I suppose), it doesn't know what to do with perl's non-UTF-8 output, and shows the general "I'm confused" replacement character.

      Thank you. Now I totally understand.