in reply to Re^2: Problem reading £ sign with XML::Simple
in thread Problem reading £ sign with XML::Simple

Unicode is a 16 bit character set. Utf16 is an encoding where all characters in the input stream are represented with two bytes just as normal integers are represented. The problem with this is that it makes all of the legacy C code (especially present in *NIX systems) choke and die horribly under most circumstances as the encoding normally involves lots of null bytes which the standard libraries cant handle. utf8 is a kludge to prevent these problems. Basically what it does is map the two byte representation to a representation of 1 to 7 chars none of which are ever null (unless the char itself is null), along with a couple of other interesting properties: the seven bit ascii set is valid utf8, and no substring of a normalized valid utf8 character representation is itself a valid character representation (this is useful at times).

Anyway, the point is that itas pretty unlikely that you are going to work with utf16 encoding very often, although you might find yourself doing so on Win32 architecture as internally Windows uses widechars for everything iirc.

NOTE: caveat emptor, this is as I remember things working from when I last dealt with unicode in detail i cant promise ive got the details exactly right.

---
demerphq

  • Comment on Re^3: Problem reading £ sign with XML::Simple