in reply to Re^4: print UTF-8 problem
in thread print UTF-8 problem

cp65001 is UTF-8, so that's good. What do you expect instead of <הער׳11>? It would also be good to have the input and output in hex form.

Replies are listed 'Best First'.
Re^6: print UTF-8 problem
by HelenCr (Monk) on Feb 17, 2012 at 00:36 UTC

    It seems that it's not a Perl problem. I have a wide-character text file that looks fine in Notepad ("UTF-8 encoding"), in Notepad++, and when I cut-and-paste into MS Word it looks fine too. But when I open a "DOS box" (Windows console) and go: "type file.txt", it prints gibberish.

    And yes, I did all the recommendations for Unicode on Windows console: I opened the console using "cmd /u", I changed the font to Lucida, and I've entered: "chcp 65001".

        @nikosv: I did that, and it helped. Many thanks

      If the file is UTF-8 and the chcp is 65001, it should work.

      What do you get from

      perl -CS -E"say map chr, 0x2660, 0x100;"

      You should get the following (like I do)

      ♠Ā

      Is cygwin involved at all? (e.g. Are you using the bash shell?)

      What do you expect to get instead of <הער׳11>?

        @ikegami: There are two problems there: 1. The Windows Console font does not have many Unicode glyphs; and 2: Perl IO does not manage well with Windows console output buffering.
        It was solved using SourceForge's "Console" utility (aka "Console2"), and specifying "unix" for the output binmode.
        There is no need to use Cygwin.