in reply to converting character types

Well, UTF-8 is an encoding for Unicode code points. There are thousands and thousands of Unicode code points defined. In Perl, there's room for 2**48 code points. ISO-8859-8 only has 256 code points. You have a potential problem here.

However, you may want to look at the Encode modules that come with perl 5.8.0.

Abigail

Replies are listed 'Best First'.
Re: Re: converting character types
by seattlejohn (Deacon) on Jul 23, 2002 at 05:09 UTC
    2**48? Just curious as to the rationale for this. I thought Unicode supported only about 1.1 million code points (0-0x10FFFF) and ISO 10646, 2**32 (or maybe it was 2**31) code points. Is there another charset/encoding standard that requires more, is this a consequence of something internal to Perl, or am I misinformed?
      It's just a natural extension - a UTF-8 character can take up to 6 bytes. It isn't quite 2**48 of course, as some bitpatterns are illegal, but it's in this ball park.

      Abigail