Unicode is a 16 bit character set. Utf16 is an encoding where all characters in the input stream are represented with two bytes just as normal integers are represented. The problem with this is that it makes all of the legacy C code (especially present in *NIX systems) choke and die horribly under most circumstances as the encoding normally involves lots of null bytes which the standard libraries cant handle. utf8 is a kludge to prevent these problems. Basically what it does is map the two byte representation to a representation of 1 to 7 chars none of which are ever null (unless the char itself is null), along with a couple of other interesting properties: the seven bit ascii set is valid utf8, and no substring of a normalized valid utf8 character representation is itself a valid character representation (this is useful at times).

Anyway, the point is that itas pretty unlikely that you are going to work with utf16 encoding very often, although you might find yourself doing so on Win32 architecture as internally Windows uses widechars for everything iirc.

NOTE: caveat emptor, this is as I remember things working from when I last dealt with unicode in detail i cant promise ive got the details exactly right.

---
demerphq


In reply to Re^3: Problem reading £ sign with XML::Simple by demerphq
in thread Problem reading £ sign with XML::Simple by gothic_mallard

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.