John M. Dlugosz has asked for the wisdom of the Perl Monks concerning the following question:

The Win32::Clipboard module only grabs the contents as ANSI Text, so any characters that cannot be represented in the current code page turn into '?'.

Using the more general GetAs method, the data is returned in the expected underlying Windows representation, but it is truncated at the first 0 byte! If it returned the contents properly as a buffer, I could then process it into a Perl-friendly format, but since it's not returning the data properly (up to the stated length complete with embedded nul's), it's kind of stuck.

Win32::Clipboard is written in XS so the module isn't easily updated. Is there any plans to extend it, or does anyone have a solution already?

Here is a simple program that illustrates the problem. Use the Keyboard Map utility to select chars, or copy something from a document, that uses characters not in the ANSI code page. Then run this to see what Perl thinks is on the clipboard.

${^WIDE_SYSTEM_CALLS}=1; use strict; use warnings; use utf8; use Win32::Clipboard; sub dumpstring { my ($caption, $x)= @_; my ($outstring, $out); open $out, ">", \$outstring or die; print "$caption: ($x) "; while ($x =~ /./g) { printf {$out} "%x ", ord($&); } return $outstring; } my $CLIP = Win32::Clipboard(); my $x= $CLIP->Get(); print dumpstring ("ANSI contents", $x), "\n"; my @formats= $CLIP->EnumFormats(); print "Formats: @formats\n"; $x= $CLIP->GetAs (13); print dumpstring ("GetAs Unicode contents", $x), "\n";
—John

Replies are listed 'Best First'.
Re: Win32::Clipboard and Unicode
by tye (Sage) on Apr 22, 2003 at 18:05 UTC

    The MS API GetClipboardData provides no way to get the length of the data returned. Quite an unfortunate design.

    The patch for GetAs to handle CF_UNICODETEXT is rather small. Add a CF_UNICODETEXT case that determines the length of the data by searching for two adjacent zero bytes (or, better, using an MS API that does this). You could even tell Perl that the string is Unicode since the support for such in Perl is starting to mature.

    I'll let you track down the details. It shouldn't be terribly difficult, just a bit time-consuming. If you run into road blocks, let us know.

    You could also provide an alternate interface similar to GetAs that returns the actual pointer instead of trying to copy the pointed-at string into a Perl variable. This would allow you to grab as much data as you want from that pointer using Perl code:

    SV * GetAsPointer( format ) int format CODE: RETVAL= &PV_sv_undef; if( OpenClipboard(NULL) ) { char *data= (char *)GetClipboardData((UINT)format); RETVAL= newSVpvn( data, (char *)&data, sizeof(data) ); CloseClipboard(); } OUTPUT: RETVAL
    but note that it is slow and painful to use Perl to pull the data out of such a pointer until you find the end of the string:
    my $ptr= GetAsPointer(13); my $len= 0; my $head; do { $len += 2; # (update) $head= unpack "P$len", $ptr; } until( "\0\0" eq substr($head,-2) );
    (updated)

                    - tye
      I suppose doing a simple patch would be easier than trying to write a valid XS from scratch, especially since the interface doesn't change.

      Where can I find the source code for the current/latest version of Win32::Clipboard? (I have ActiveState Perl, which is a binary distribution. Their "source code" download AP806_source.zip doesn't contain any file named Clipboard.*)

      The MS API GetClipboardData provides no way to get the length of the data returned. Quite an unfortunate design.

      Hmm, it actually returns a HANDLE to a global memory block, which is as far as I can tell the only remaining use for such a thing. In Win32 HGLOBAL's have been replaced by normal memory pointers and the documentation mostly missing.

      Anyway, the Win32 function GlobalSize will return the length. Do that before copying via memcpy what GlobalLock returns, instead of using a strcpy. However, it also says, “The size of a memory block may be larger than the size requested when the memory was allocated.” so perhaps this shows the rounded-up capacity, not the requested size.

      If a function returned a pointer like you suggest, rather than copying it, how will Perl know to free it eventually?

      To find the bytes 00 00, why not use a regex instead of a eq substr? /.*?\0\0/ or somesuch.

      —John

        Perhaps the returned pointer points just beyond some structure that contains the length information. That sounds familiar. It is worth investigating. My conclusions were based on rather quick checks and my not imagining that possibility.

        I have updated the (previously broken) code and so the use of substr may make more sense to you now. I assume that by using a regex to search for \0\0 you are thinking of grabbing some large chunk of data and then looking for the end. If you grab too large of a chunk, you can cause an access violation, so the only totally safe approach is to grab two more bytes each time. I'd still use index over a regex if you try that.

        I don't free the data returned and I don't recall the module doing so either. So I probably have a memory leak and the module might as well. If so, that is something else to consider fixing in your patch. (:

        As to your other reply asking about the alternative to XST_mPV() that lets you specify a length... I use a very limited set of XS items that I find are the most robust. I never access nor manipulate the stack directly and just use a return type of "SV *" and set RETVAL. You'll probably have to look up that macro in the *.h files and roll your own, probably </code>ST(0) = newSVpvn(...)</code> perhaps with some mortalization/ref-count munging.

        Some authors of Win32 modules don't bother to put their modules on CPAN. I avoid using such. Lots of Win32 modules are available on CPAN. All of my released ones are.

                        - tye
      Well, I found the module on CPAN. I thought the stuff that comes with Perl wasn't also on CPAN, but I guess that doesn't extend to platform-specific modules.

      The code is basically:

      HANDLE myhandle = GetClipboardData(CF_TEXT); XST_mPV(0, (char *) myhandle);
      It seems that the assumption that the HANDLE is equal to the memory pointer is documented when the block is of type GMEM_FIXED. However, SetClipboardData documents that the HANDLE must be allocated with the GMEM_MOVEABLE flag!

      Anyway, the function XST_mPV will copy a string (I presume nul-terminated single byte sequence) to the proper place. What's the equivilent function that takes a length, that I can call after I figure out the proper length of the data? the perlapi document that shows this macro doesn't have a similar one that takes a byte array or buffer or whatnot.

      —John