11111110 is the largest byte-count the first byte can encode, so that's followed by 7 groups of 6 bits, or 42 bits total.

If I understand the Unicode spec properly, there's an important distinction between Unicode code points (what we tend to think of as characters) and Unicode encodings, e.g. UTF-8. The current version of Unicode defines "only" 0x10FFFF code points or possible characters, which they claim should be more than enough to handle every character in every modern and historical language every written.

There are then a variety of transformation formats defined for representing Unicode code points as actual bytes/octets:


In reply to Re: Re: Re: Re: How are regex character classes implemented? by seattlejohn
in thread How are regex character classes implemented? by John M. Dlugosz

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.