meetraz has asked for the wisdom of the Perl Monks concerning the following question:

Hello All.

I'm writing an XS module that will interface with the Win32 API. Most of the API and library functions take unicode strings as input, such as the type "LPCWSTR". (array of wchar_t)

When I check the lib\ExtUtils\typemap file, there is an entry for wchar_t* which seems to be what I need. But, the C code that XS produces is just casting char* to wchar_t* which doesn't make any sense.. and the library functions don't like it.

I know I can do the conversion manually like this:

// get function argument char* NodeName = (char *)SvPV(ST(0),PL_na); // get argument length int nNodeNameLen = strlen(NodeName) + 1; // alloc memory LPWSTR wNodeName = (LPWSTR)malloc(nNodeNameLen * sizeof(WCHAR)); // do conversion mbstowcs(wNodeName, NodeName, nNodeNameLen); // Here's where I'd actually use the data // I'm leaving out the part where I allocate widebuffer someLibFunction(wNodeName, widebuffer, buffersize); // de-allocate free(wNodeName); /* And now I'd have to do it all in reverse: allocate a new char* buffer, convert the output back to char*, deallocate the wide buffer, put the result back on the stack, and free the wide buffer. */
Yes, that's one way to do it, and it works fine. But it sure seems like a lot of work for every string, in every function, in every module. Even if I write a function to help with some of it, I would still have to free() the strings manually after I was done using them. That doesn't sound very perl-like. There has to be an easier way to do it... Does anybody know? I prefer to have something that handles memory allocation/deallocation automatically.

I've already looked at SvPVutf8() and sv_utf8_upgrade() but neither of these appear to do what I need. I've also tried passing in the string as unicode from the perl side, using pack/unpack or utf8 or Encode but that doesn't seem to work either. Is perl's notion of unicode not the same as "array of wchar_t"? Or am I confusing two different things?

Replies are listed 'Best First'.
Re: Unicode/Wide strings and XS
by PodMaster (Abbot) on Jan 09, 2004 at 07:30 UTC
    Did you read "Using Unicode in XS" in perlunicode?

    MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
    I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
    ** The third rule of perl club is a statement of fact: pod is sexy.

      Yes, see above where I mentioned SvPVutf8() and sv_utf8_upgrade(). These don't seem to work. Perhaps "Perl Unicode" and "UTF8" are not the same thing as a "Wide String" as refered to by the Win32 API?

      Update:

      Here's an example of what I'm doing:

      // Nodename is SV* holding "examplenode" printf("Nodename=[%s]\n", SvPV_nolen(Nodename)); // output: "Nodename= +[examplenode]" sv_utf8_upgrade(Nodename); wprintf(L"Got [%s]\n", (wchar_t*)SvPVutf8_nolen(Nodename)); // output: + "Nodename=[]"
      Am I doing something wrong? The string seems to get clobbered, or it's in a format that Windows won't accept as a "wide string".
        Perhaps "Perl Unicode" and "UTF8" are not the same thing as a "Wide String" as refered to by the Win32 API?

        Unicode includes more than one encoding ("transformation format"). Perl uses UTF-8. Windows calls UTF-16 "UNICODE" or "wide" (or close to UTF-16 -- I don't think Windows supports multi-word characters which are part of UTF-16; [that is, Windows "UNICODE" appears to use only fixed-width 16-bit characters]). Windows calls UTF-8 "multi-byte".

        Update: And Perl doesn't support UTF-16 directly (there may be modules that deal with it), which could also be expressed as "Perl doesn't support multi-word nor single-word characters" using some of my loose terminology from above.

                        - tye

        [Updated]