Ah, the confusions surrounding Unicode. For something given a name that means 'one code' there sure are a lot of different ways to specify it...

UTF-16 is not a 'larger character set' than UTF-8.

UTF-16 is an 'encoding', a method of storing characters in memory; it encodes most (virtually all) characters in 16 bits. Windows NT Unicode strings are UTF-16 encoded.

UTF-8 is another encoding, and the one Perl uses internally. It encodes all of the original 7-bit ASCII characters as a single byte, identically to the way they are encoded in ANSI.

If you have an application that's expecting UTF-16, you'll want to use the Encode module (which I believe is core, in 5.8 at least) to turn your string into one that Perl will emit as UTF-16:

use Encode; my ($unicode_string, $utf16_string); $unicode_string = get_a_unicode_string(); # ^^ this string is a character string internally stored # as UTF-8 $utf16_string = encode('utf16', $unicode_string); # ^^ this string is an 'octet' (byte) string internally # stored as bytes. Each character of the string is stored in # two bytes of $utf_string. # (Also note the presence of a UTF-16 BOM) function_expecting_utf16($utf16_string);

Update:

(Thanks, ytsh)
--Stevie-O
$"=$,,$_=q>|\p4<6 8p<M/_|<('=> .q>.<4-KI<l|2$<6%s!<qn#F<>;$, .=pack'N*',"@{[unpack'C*',$_] }"for split/</;$_=$,,y[A-Z a-z] {}cd;print lc

In reply to Re: Does Perl support unicode-16? by Stevie-O
in thread Does Perl support unicode-16? by jfroebe

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.