Trying to read text that was pasted into the console window. Here's a simplified version of my code:

use Win32::Console; my $wc = Win32::Console->new(STD_INPUT_HANDLE); say 'Paste stuff now.'; while (1) { unless ($wc->GetEvents) { sleep 1; next; } my @ev = $wc->Input; my ($event_type, $key_down, $repeat_count, $v_keycode, $v_scancode, $ch_num, $control_key_state) = @ev; # wait for keyboard event (does paste count?) next unless $event_type==1 && $key_down; # prettify for debug display my $char = chr($ch_num); $char = '?' unless $char =~ /[[:print:]]/; printf qq{Key: '%s' = 0x%02x = %3dd; key %02x, scan %02x\n}, $char, $ch_num, $ch_num, $v_keycode, $v_scancode; }

When I paste text into this, it works fine — unless the text contains unicode characters. For example, if I paste the text "dog’s leg", the program outputs:

Paste stuff now. Key: 'd' = 0x64 = 100d; key 44, scan 20 Key: 'o' = 0x6f = 111d; key 4f, scan 18 Key: 'g' = 0x67 = 103d; key 47, scan 22 Key: '?' = 0x00 = 0d; key 12, scan 38 Key: '?' = 0x00 = 0d; key 63, scan 51 Key: '?' = 0x00 = 0d; key 69, scan 49 Key: 's' = 0x73 = 115d; key 53, scan 1f Key: ' ' = 0x20 = 32d; key 20, scan 39 Key: 'l' = 0x6c = 108d; key 4c, scan 26 Key: 'e' = 0x65 = 101d; key 45, scan 12 Key: 'g' = 0x67 = 103d; key 47, scan 22

Now, in the text I pasted, the character after the word "dog" is Unicode 0x2019, RIGHT SINGLE QUOTATION MARK. Oddly, Win32::Console::Input translates it as scancodes 0x38, 0x51, 0x49. Which, according to some page I found on the intertubes, is "Alt-key" "Numeric-keypad-3" "Numeric-keypad 9". Which is curious, because 39 is the (decimal) code for "'"; that is, a regular old ASCII apostrophe. And if you type ALT-Num3-Num9, you'll get an apostrophe.

But I don't really want an apostrophe; I want U+2019. If I have to, I suppose I could search for Alt-Num-Num combinations and manually translate them, and get a bastardized ASCII representation of what was pasted, but... ugh.

Is there a better way?

(For what it's worth: I'm running Strawberry Perl 5.12.2 on Windows 7 Professional, 64-bit, SP1. Win32::Console version is 0.09. The input Code Page that Win32::Console is using is 65001, which is CP_UTF8.)


In reply to Unicode input for Win32::Console by Sue D. Nymme

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.