More info: UTF8 ASCII as implemented in perl requires a second byte for codepoints 0x80 and higher, a third byte at 0x800, a fourth at 0x10000, a fifth at 0x200000, a sixth at 0x4000000 and a seventh at 0x80000000.
Note that this extends beyond the defined Unicode range, since we may store things other than Unicode characters in our strings - perl supports any integer that fits in a UV (32-bit or 64-bit unsigned integer, depending on your perl build) as a codepoint.
If I understand the code correctly (Perl_uvuni_to_utf8_flags() in utf8.c), higher codepoints (available only where perl is compiled with 64-bit integer support) use 7 bytes up to 0x1000000000, and a fixed 13 bytes for the rest.
Hugo
In reply to Re^2: generate character string based on byte count !!
by hv
in thread generate character string based on byte count !!
by barathbr
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |