Re: Does Perl support unicode-16?

Ah, the confusions surrounding Unicode. For something given a name that means 'one code' there sure are a lot of different ways to specify it...

UTF-16 is not a 'larger character set' than UTF-8.

UTF-16 is an 'encoding', a method of storing characters in memory; it encodes most (virtually all) characters in 16 bits. Windows NT Unicode strings are UTF-16 encoded.

UTF-8 is another encoding, and the one Perl uses internally. It encodes all of the original 7-bit ASCII characters as a single byte, identically to the way they are encoded in ANSI.

If you have an application that's expecting UTF-16, you'll want to use the Encode module (which I believe is core, in 5.8 at least) to turn your string into one that Perl will emit as UTF-16:

use Encode;
my ($unicode_string, $utf16_string);

$unicode_string = get_a_unicode_string();

# ^^ this string is a character string internally stored
# as UTF-8

$utf16_string = encode('utf16', $unicode_string);

# ^^ this string is an 'octet' (byte) string internally
# stored as bytes. Each character of the string is stored in
# two bytes of $utf_string.
# (Also note the presence of a UTF-16 BOM)

function_expecting_utf16($utf16_string);
[download]

Update:

Fixed missing parens around multivariable my().
Added comment elaborating on $unicode_string.

(Thanks, ytsh)

--Stevie-O

$"=$,,$_=q>|\p4<6 8p<M/_|<('=>
.q>.<4-KI<l|2$<6%s!<qn#F<>;$,
.=pack'N*',"@{[unpack'C*',$_]
}"for split/</;$_=$,,y[A-Z a-z]
         {}cd;print lc
[download]

Comment on Re: Does Perl support unicode-16? Select or Download Code