John M. Dlugosz has asked for the wisdom of the Perl Monks concerning the following question:

OLE Automation's strings are passed back and forth using UCS-2 format (16 bits per character) and Unicode.

So why is the string I'm getting back from a call into a Word Document object giving me a string full of 8-bit characters from the Windows character set? E.g. it uses 0x92 for a single-right-quote.

I can only suppose that somewhere, the OLE module is converting Unicode to the current Code Page, instead of to UTF8.

Is that something I can change? Is that a bug?

—John

Replies are listed 'Best First'.
Re: Win32::OLE and character sets
by HyperZonk (Friar) on Jul 24, 2001 at 03:14 UTC
    The Win32:OLE module documentation describes this behavior. Of particular interest to you might be Win32::OLE->Option() which will allow you to change the code page translation between Perl and the OLE object (setting the CP option).

    -HZ
      use v5.6.1; use strict; use warnings; use utf8; use Win32::OLE; use Carp; Win32::OLE->Option (CP => Win32::OLE::CP_UTF8);
      I don't think I missed anything.

      The strings gotten from OLE are indeed UTF8-encoded, but they are still marked as having byte orientation!

      Now that's arguably a bug.

      —John

      Thanks. I'll have to go through the docs again from the top. I seem to have missed that when I re-read them last week.