Well, UTF-8 is an encoding for Unicode code points. There are
thousands and thousands of Unicode code points defined. In
Perl, there's room for 2**48 code points. ISO-8859-8 only
has 256 code points. You have a potential problem here.
However, you may want to look at the Encode modules that
come with perl 5.8.0.
Abigail | [reply] |
2**48? Just curious as to the rationale for this. I thought Unicode supported only about 1.1 million code points (0-0x10FFFF) and ISO 10646, 2**32 (or maybe it was 2**31) code points. Is there another charset/encoding standard that requires more, is this a consequence of something internal to Perl, or am I misinformed?
| [reply] |
It's just a natural extension - a UTF-8 character can take
up to 6 bytes. It isn't quite 2**48 of course, as some
bitpatterns are illegal, but it's in this ball park.
Abigail
| [reply] |