Re: import UTF-16 strings in XS

I'm not aware of any perlapi XS functions that deal with UTF-16. Doing something in perlspace using the Encode module or a UTF-16-aware filehandle is probably your best bet. Shy of that, I think you'd have to write your own UTF-16 to UTF-8 converter in C. That's not insane, but it's hard work testing it properly and preparing to deal with all the possible permutations of malformed data.

--
Marvin Humphrey
Rectangular Research ― http://www.rectangular.com

Comment on Re: import UTF-16 strings in XS

Replies are listed 'Best First'.
Re^2: import UTF-16 strings in XS by graff (Chancellor) on Sep 13, 2006 at 04:12 UTC
I was going to say that a C implementation of the UTF-16 to UTF-8 conversion would be pretty simple and robust -- in fact, you can probably find a C snippet for this at http://www.unicode.org. But it's true that that if you mistakenly feed random (non-UTF-16) data into this sort of conversion, the result might be worse than just "garbage out". There are a fair number of "gaps" in the 16-bit space, where Unicode doesn't really have anything defined, as well as some spots that are specifically defined as "not usable characters". And heaven forbid the input data should contain anything in the UTF-16 "Surrogate" range (0xD800-0xDFFF), which is reserved for building "wider" characters using two consecutive 16-bit values (these get rendered into 4-byte utf8 codes, whereas all other UTF-16 code points end up as 1, 2 or 3 bytes in utf8).	[reply]
Re^3: import UTF-16 strings in XS (Win32) by tye (Sage) on Sep 13, 2006 at 06:58 UTC
Win32 comes with APIs for converting from UTF-16 (or perhaps something similar, in any case likely referred to as "UNICODE") to UTF-8 (likely called "mutli-byte-character strings"). Unfortunately, this is the wrong computer with too tiny a browser to easily look up the name. I prefer to do such conversions in Perl anyway, as it reduces the complexity of the XS code (almost always a good idea) and allows one to avoid converting twice if you end up just passing the output from one API into another. If one really wants to do this conversion in C, then I'd strongly encourage providing an XS routine that does just this conversion and then provide a Perl sub to conveniently wrap the 2 (or more) XS calls for the "common case". - tye	[reply]