Re: converting character types

Well, UTF-8 is an encoding for Unicode code points. There are thousands and thousands of Unicode code points defined. In Perl, there's room for 2**48 code points. ISO-8859-8 only has 256 code points. You have a potential problem here.

However, you may want to look at the Encode modules that come with perl 5.8.0.

Abigail

Comment on Re: converting character types

Replies are listed 'Best First'.
Re: Re: converting character types by seattlejohn (Deacon) on Jul 23, 2002 at 05:09 UTC
248? Just curious as to the rationale for this. I thought Unicode supported only about 1.1 million code points (0-0x10FFFF) and ISO 10646, 232 (or maybe it was 2**31) code points. Is there another charset/encoding standard that requires more, is this a consequence of something internal to Perl, or am I misinformed?	[reply]
Re: converting character types by Abigail-II (Bishop) on Jul 23, 2002 at 09:16 UTC
It's just a natural extension - a UTF-8 character can take up to 6 bytes. It isn't quite 2**48 of course, as some bitpatterns are illegal, but it's in this ball park. Abigail	[reply]