The Win32::Clipboard module only grabs the contents as ANSI Text, so any characters that cannot be represented in the current code page turn into '?'.

Using the more general GetAs method, the data is returned in the expected underlying Windows representation, but it is truncated at the first 0 byte! If it returned the contents properly as a buffer, I could then process it into a Perl-friendly format, but since it's not returning the data properly (up to the stated length complete with embedded nul's), it's kind of stuck.

Win32::Clipboard is written in XS so the module isn't easily updated. Is there any plans to extend it, or does anyone have a solution already?

Here is a simple program that illustrates the problem. Use the Keyboard Map utility to select chars, or copy something from a document, that uses characters not in the ANSI code page. Then run this to see what Perl thinks is on the clipboard.

${^WIDE_SYSTEM_CALLS}=1; use strict; use warnings; use utf8; use Win32::Clipboard; sub dumpstring { my ($caption, $x)= @_; my ($outstring, $out); open $out, ">", \$outstring or die; print "$caption: ($x) "; while ($x =~ /./g) { printf {$out} "%x ", ord($&); } return $outstring; } my $CLIP = Win32::Clipboard(); my $x= $CLIP->Get(); print dumpstring ("ANSI contents", $x), "\n"; my @formats= $CLIP->EnumFormats(); print "Formats: @formats\n"; $x= $CLIP->GetAs (13); print dumpstring ("GetAs Unicode contents", $x), "\n";
—John

In reply to Win32::Clipboard and Unicode by John M. Dlugosz

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.