Unicode/Wide strings and XS

meetraz has asked for the wisdom of the Perl Monks concerning the following question:

Hello All.

I'm writing an XS module that will interface with the Win32 API. Most of the API and library functions take unicode strings as input, such as the type "LPCWSTR". (array of wchar_t)

When I check the lib\ExtUtils\typemap file, there is an entry for wchar_t* which seems to be what I need. But, the C code that XS produces is just casting char* to wchar_t* which doesn't make any sense.. and the library functions don't like it.

I know I can do the conversion manually like this:

// get function argument
char* NodeName = (char *)SvPV(ST(0),PL_na);

// get argument length
int nNodeNameLen = strlen(NodeName) + 1;

// alloc memory
LPWSTR wNodeName = (LPWSTR)malloc(nNodeNameLen * sizeof(WCHAR));

// do conversion
mbstowcs(wNodeName, NodeName, nNodeNameLen);

// Here's where I'd actually use the data
// I'm leaving out the part where I allocate widebuffer
someLibFunction(wNodeName, widebuffer, buffersize);

// de-allocate
free(wNodeName);

/*
    And now I'd have to do it all in reverse: 
    allocate a new char* buffer, convert the output
    back to char*, deallocate the wide buffer, put 
    the result back on the stack, and free the wide buffer.
*/
[download]

Yes, that's one way to do it, and it works fine. But it sure seems like a lot of work for every string, in every function, in every module. Even if I write a function to help with some of it, I would still have to free() the strings manually after I was done using them. That doesn't sound very perl-like. There has to be an easier way to do it... Does anybody know? I prefer to have something that handles memory allocation/deallocation automatically.

I've already looked at SvPVutf8() and sv_utf8_upgrade() but neither of these appear to do what I need. I've also tried passing in the string as unicode from the perl side, using pack/unpack or utf8 or Encode but that doesn't seem to work either. Is perl's notion of unicode not the same as "array of wchar_t"? Or am I confusing two different things?

Comment on Unicode/Wide strings and XS Download Code

Replies are listed 'Best First'.

Re: Unicode/Wide strings and XS
by PodMaster (Abbot) on Jan 09, 2004 at 07:30 UTC

perlunicode

MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
** The third rule of perl club is a statement of fact: pod is sexy.

[reply]

Re: Re: Unicode/Wide strings and XS

by meetraz (Hermit) on Jan 09, 2004 at 15:46 UTC

Update:

Here's an example of what I'm doing:

// Nodename is SV* holding "examplenode"
printf("Nodename=[%s]\n", SvPV_nolen(Nodename)); // output: "Nodename=
+[examplenode]"

sv_utf8_upgrade(Nodename);

wprintf(L"Got [%s]\n", (wchar_t*)SvPVutf8_nolen(Nodename)); // output:
+ "Nodename=[]"
[download]

[reply]
[d/l]

Re^3: Unicode/Wide strings and XS ("unicode")

by tye (Sage) on Jan 09, 2004 at 16:41 UTC

Perhaps "Perl Unicode" and "UTF8" are not the same thing as a "Wide String" as refered to by the Win32 API?

Unicode includes more than one encoding ("transformation format"). Perl uses UTF-8. Windows calls UTF-16 "UNICODE" or "wide" (or close to UTF-16 -- I don't think Windows supports multi-word characters which are part of UTF-16; [that is, Windows "UNICODE" appears to use only fixed-width 16-bit characters]). Windows calls UTF-8 "multi-byte".

Update: And Perl doesn't support UTF-16 directly (there may be modules that deal with it), which could also be expressed as "Perl doesn't support multi-word nor single-word characters" using some of my loose terminology from above.

- tye

[Updated]

[reply]