And don't forget
- UCS-4: “each encoded character is represented in a 32-bit quantity within a code space 0..7FFFFFFF”
Unicode defines a space of 0x10FFFF code points, but ISO 646 defines a space of 0x7fffffff, or 31 bit values. However, the highest plane is already for private use, and they promised to assign real codes starting from the bottom, so the smaller domain of Unicode should not be a problem until they actually run out.
Yes, there is a big difference between the code points and the encodings. A capital 'A' is the value 65. How you store the 65 in your program is beside the point. It could be a 7-bit integer, a 64-bit integer, a floating-point number, a string of EBCDIC digits, Huffman-encoded variable-length fields, or whatever.
UTF-8 is great for the reasons you list, and for a few others: it's a strict superset of ASCII, and it's byte-order neutral.
—John
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.