nadroj has asked for the wisdom of the Perl Monks concerning the following question:

i have something i have been trying to get to work in windows for the past almost 3 weeks. ive searched many places and cant seem to find an answer.

how do i input unicode characters into a perl command-line program (using "cmd.exe") and have the character(s) printed back to the console properly?

i have tried using the perl "-C" switch to "tell" perl how to expect command line input, but it does not work. i have tried setting the "binmode", and "encode"/"decode"-ing. i input characters either by using the windows "character map" or using "alt+####" (for example alt+2190 for a left arrow). when i try to output the character it will print a "?" (question mark), which i understand the reason for this is because it does not know how to represent this character or understand it.

in cmd.exe i have used "chcp 65001" to set the encoding to UTF-8, and changed the font to "Lucida Console".

if i write a perl script to output a hardcoded unicode character (ie "\x{2190}") it does not print properly to the screen, however if i write it to a file or redirect the output to a file, it appears perfectly in the file. for example if i were to run the following: print "(\x{2190})";it would appear, in the console, as something like: ")<left arrow printed properly here>)" notice the first open parenthesis is printed as a close parenthesis. this happens if it is any character, not just parenthesis (of course). there seems to be some confusion on cmd.exe's part, as you can see the last character is printed either 2 times or overwrites the first character.

using the same perl script in linux (ie "konsole") works, and, as stated above, the script works when output is to a file. this likely means that it is a cmd.exe problem and not perl. i think a similar solution is necessary to allow unicode input in windows too.

thanks for any help

Replies are listed 'Best First'.
Re: Unicode input in windows xp (cmd.exe)
by ikegami (Patriarch) on Sep 01, 2008 at 06:04 UTC

    First things first, using print to output a generated character works fine:

    >perl -CS -le"print chr(0x2190)"
    ←
    

    Even using with the parens you used

    >perl -CS -le"print qq{(\x{2190})}"
    (←)
    

    And if fed the UTF8 encoded character, it's understood..

    >perl -CS -le"print chr(0x2190)" | perl -CS -le"use Devel::Peek; Dump($x=<>)"
    SV = PV(0x226144) at 0x225ff4
      REFCNT = 1
      FLAGS = (POK,pPOK,UTF8)
      PV = 0x182ce3c "\342\206\220\n"\0 [UTF8 "\x{2190}\n"]
      CUR = 4
      LEN = 80
    

    However, I can't seem to paste the character into the console.

    >perl -CS -le"use Devel::Peek; Dump($x=<>)"
    a←b
    SV = PV(0x226144) at 0x225ff4
      REFCNT = 1
      FLAGS = ()
      PV = 0x182ce3c ""\0     ← The whole line is dropped!!!
      CUR = 0
      LEN = 80
    

    Alt-2190 doesn't work, but then again, it doesn't work in notepad either.

    This was ActivePerl 5.8.8 on Windows XP

      ikegami:
      the chr works but i wanted to output the character as i specified in the first post, ie print "(\x{2190})";using "-CS" or "-CIO", etc only removes the warning and doesnt make the output correct. my output is always "(←))" (notice 2 close parentheses).

      running "print qq{(\x{2190})};" in a script gives the same output as above.

      note: im running my scripts as "perl -CS <filename>.pl" or "perl <filename>.pl"

      having hardcoded unicode characters in my script isnt really what i want to do, but i was testing those to see if it could even print the characters properly, which it cant seem to do. in my first paragraph, the script to print \x{2109} prints 2 closing parenthesis on the console. however if i redirect the output (using ">") to a file, and then i open the file with say notepad it will say it is (properly) encoded as UTF8 and the output is correct (1 open and 1 close parenthesis with an arrow in the middle). running a script with contents: print "(\x{2190})\n"; will print TWO new line characters on the console. redirecting output to a file and viewing the file displays it perfectly (with only 1 newline).

      to input unicode characters at the command line you can copy and paste the character from windows' "character map". there is a registry key you need to be able to enter unicode characters (ie "alt+2190", note you actually press "+" doing this way. the registry key i dont remember right now, but it works the same copying/pasting from character map.

      Anonymous Monk:
      i should have shown my code, i agree, but i have it above now. the version of perl im using does support these options (and i assume it would have shown an error otherwise), perl -v gives: "This is perl, v5.10.0 built for MSWin32-x86-multi-thread".

      regarding your link, my font is fine, as the characters can be displayed (ie in notepad with the font set to Lucida Console, or in cmd.exe the character can be printed, however improperly when mixed, as in with parentheses above). see here for other reference, notice U+2190 is supported in this list (http://www.fileformat.info/info/unicode/font/lucida_console/list.htm).

      here is a sample script to try, i put comments in it to explain the setup.

      # create a file in notepad with the single unicode character U+2190 #(left arrow) # on the 2nd line, save as "UTF8" open ( FH, "<:encoding(UTF-8)", "file.txt"); # since the BOM (byte order mark) will be read, we dont want it #printed, so skip the first line. this is why we typed the arrow on #the second line # NOT first line! <FH>; # now we are at the second line, save the arrow that should be here my $line = <FH>; # this gets rid of the wide character warning, as we know it is UTF8 utf8::encode($line); # toggle between the following two print statements to see how they # differ # note the first one should print fine, but is kind of useless because # if you print anything else, the output will be incorrect. maybe #because of mixing encodings, but it works fine redirected to a file. # the second print statement shows this #print "$line"; #print "($line)"

      thanks a lot for your time guys, and i hope you know what i can do to allow unicode input from command line, with proper output.

        basically, i want to be able to input unicode characters (such as left arrow, U+2190) to my perl script in cmd.exe and have it printed properly. this narrows down to getting the following to work: filename "myScript.pl" with contents: print "you entered ($ARGV[0])'\n"; and run it in cmd.exe as: "perl myScript.pl ←"

        i dont care about getting wide character print warnings.

Re: Unicode input in windows xp (cmd.exe)
by Anonymous Monk on Sep 01, 2008 at 04:05 UTC
    i have tried using the perl "-C" switch to "tell" perl how to expect command line input, but it does not work.
    How do you know? Maybe you have an old perl which doesn't support that option. Maybe you have a typo. Show your code.

    this likely means that it is a cmd.exe problem and not perl. i think a similar solution is necessary to allow unicode input in windows too.
    Yes, cmd.exe is responsible for display (all those ??). So try searching microsoft.public.win32.programmer.international or msdn ... also 'help cmd'

Re: Unicode input in windows xp (cmd.exe)
by massa (Hermit) on Sep 01, 2008 at 19:09 UTC
    What happens when you use perl -C127 ??
    []s, HTH, Massa (κς,πμ,πλ)
      when running the script in my last post (ie printing ARGV[0]) i get the same output when doing your "-C127", that is it prints "?" in place of left arrow.
        is this not possible?