UTF-16 is not a 'larger character set' than UTF-8.
UTF-16 is an 'encoding', a method of storing characters in memory; it encodes most (virtually all) characters in 16 bits. Windows NT Unicode strings are UTF-16 encoded.
UTF-8 is another encoding, and the one Perl uses internally. It encodes all of the original 7-bit ASCII characters as a single byte, identically to the way they are encoded in ANSI.
If you have an application that's expecting UTF-16, you'll want to use the Encode module (which I believe is core, in 5.8 at least) to turn your string into one that Perl will emit as UTF-16:
use Encode; my ($unicode_string, $utf16_string); $unicode_string = get_a_unicode_string(); # ^^ this string is a character string internally stored # as UTF-8 $utf16_string = encode('utf16', $unicode_string); # ^^ this string is an 'octet' (byte) string internally # stored as bytes. Each character of the string is stored in # two bytes of $utf_string. # (Also note the presence of a UTF-16 BOM) function_expecting_utf16($utf16_string);
Update:
$"=$,,$_=q>|\p4<6 8p<M/_|<('=> .q>.<4-KI<l|2$<6%s!<qn#F<>;$, .=pack'N*',"@{[unpack'C*',$_] }"for split/</;$_=$,,y[A-Z a-z] {}cd;print lc
In reply to Re: Does Perl support unicode-16?
by Stevie-O
in thread Does Perl support unicode-16?
by jfroebe
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |