If I understand the Unicode spec properly, there's an important distinction between Unicode code points (what we tend to think of as characters) and Unicode encodings, e.g. UTF-8. The current version of Unicode defines "only" 0x10FFFF code points or possible characters, which they claim should be more than enough to handle every character in every modern and historical language every written.
There are then a variety of transformation formats defined for representing Unicode code points as actual bytes/octets:
In reply to Re: Re: Re: Re: How are regex character classes implemented?
by seattlejohn
in thread How are regex character classes implemented?
by John M. Dlugosz
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |