Sue D. Nymme has asked for the wisdom of the Perl Monks concerning the following question:

Trying to read text that was pasted into the console window. Here's a simplified version of my code:

use Win32::Console; my $wc = Win32::Console->new(STD_INPUT_HANDLE); say 'Paste stuff now.'; while (1) { unless ($wc->GetEvents) { sleep 1; next; } my @ev = $wc->Input; my ($event_type, $key_down, $repeat_count, $v_keycode, $v_scancode, $ch_num, $control_key_state) = @ev; # wait for keyboard event (does paste count?) next unless $event_type==1 && $key_down; # prettify for debug display my $char = chr($ch_num); $char = '?' unless $char =~ /[[:print:]]/; printf qq{Key: '%s' = 0x%02x = %3dd; key %02x, scan %02x\n}, $char, $ch_num, $ch_num, $v_keycode, $v_scancode; }

When I paste text into this, it works fine — unless the text contains unicode characters. For example, if I paste the text "dog’s leg", the program outputs:

Paste stuff now. Key: 'd' = 0x64 = 100d; key 44, scan 20 Key: 'o' = 0x6f = 111d; key 4f, scan 18 Key: 'g' = 0x67 = 103d; key 47, scan 22 Key: '?' = 0x00 = 0d; key 12, scan 38 Key: '?' = 0x00 = 0d; key 63, scan 51 Key: '?' = 0x00 = 0d; key 69, scan 49 Key: 's' = 0x73 = 115d; key 53, scan 1f Key: ' ' = 0x20 = 32d; key 20, scan 39 Key: 'l' = 0x6c = 108d; key 4c, scan 26 Key: 'e' = 0x65 = 101d; key 45, scan 12 Key: 'g' = 0x67 = 103d; key 47, scan 22

Now, in the text I pasted, the character after the word "dog" is Unicode 0x2019, RIGHT SINGLE QUOTATION MARK. Oddly, Win32::Console::Input translates it as scancodes 0x38, 0x51, 0x49. Which, according to some page I found on the intertubes, is "Alt-key" "Numeric-keypad-3" "Numeric-keypad 9". Which is curious, because 39 is the (decimal) code for "'"; that is, a regular old ASCII apostrophe. And if you type ALT-Num3-Num9, you'll get an apostrophe.

But I don't really want an apostrophe; I want U+2019. If I have to, I suppose I could search for Alt-Num-Num combinations and manually translate them, and get a bastardized ASCII representation of what was pasted, but... ugh.

Is there a better way?

(For what it's worth: I'm running Strawberry Perl 5.12.2 on Windows 7 Professional, 64-bit, SP1. Win32::Console version is 0.09. The input Code Page that Win32::Console is using is 65001, which is CP_UTF8.)

Replies are listed 'Best First'.
Re: Unicode input for Win32::Console
by BrowserUk (Patriarch) on Oct 24, 2011 at 21:11 UTC

    You will need to use the high-level InputChar() rather than the low-level Input(). And you will almost certainly need to set the consoles input code-page (try 65001).


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Hi BrowserUK. Thanks for your reply. I tried your suggestion. Here is a new test program:

      use Win32::Console; my $wc = Win32::Console->new(STD_INPUT_HANDLE); $wc->InputCP(65001); say 'Paste stuff now.'; Char: while (1) { my $char = $wc->InputChar(1); if (!defined $char) { say 'Read (undef)'; next Char; } # "pretty" character, for display my $pch = $char =~ /[[:^print:]]/? '?' : $char; # Deconstruct the character my @c = unpack 'C*', $char; say "Read: '$pch' = ", join ' - ', map sprintf('%02X',$_), @c; last if $char eq "\n"; }

      It didn't work. InputChar returns undef when I paste a unicode character. Here's the output of the program when I pasted "# “dog’s” leg":

      Read: '#' = 23 Read: ' ' = 20 Read (undef) Read: 'd' = 64 Read: 'o' = 6F Read: 'g' = 67 Read (undef) Read: 's' = 73 Read (undef) Read: ' ' = 20 Read: 'l' = 6C Read: 'e' = 65 Read: 'g' = 67 Read: '?' = 0D Read: '?' = 0A

      The doco for InputChar says that it returns undef "on errors". But the module doesn't provide an interface to the Windows Console LastError function, so I can't see how to tell what error occurred.

      Any suggestions?

        But the module doesn't provide an interface to the Windows Console LastError function, so I can't see how to tell what error occurred.

        Use the Perl built-in $^E to display the windows error code or text.

        I'll try to take a look at this later.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.