in reply to Unicode and You

Yes, I second your comments. Besides working correctly, perl 5.8 also adds a couple of nifty features. For example, it lets you conveniently set the input and output character sets for a filehandle and will take care of all the necessary encoding for you. And it adds more alphabet/character classes for regexps.

One thing I disliked about 5.6.1 was that it was impossible to tell it that I want my in- and output as UTF-8. In some situations, it kept on treating my UTF-8-encoded input as raw 8-bit characters and tried to encode them as UTF-8 *again* when printing them to STDOUT... While I could solve my problems, it took me a while to work around it.

perl 5.6.0 was worse. I did a simple module to convert Chinese traditional characters to simplified ones (Yes, I know there are two on CPAN already, but I had a good reason to do so), using a conversion table in a hash. For whatever reason, 5.6.0 would produce malformed characters, but only in some cases -- on 5.6.1 it works fine.

Now the only problem I'm facing is... writing my scripts so they will work well (or fail gracefully) with 5.8.0, 5.6.1, 5.6.0 etc...

Replies are listed 'Best First'.
Re: Re: Unicode and You
by belg4mit (Prior) on Aug 19, 2002 at 18:11 UTC
    >Now the only problem I'm facing is... writing my scripts so they will work well (or fail gracefully) with 5.8.0, 5.6.1,
    >5.6.0 etc...

    That's actually what I'm working on, only I'm keeping the span open for /5\.00\d/ Although I only have to handle input, not output. My solution has been to handle the raw bytes and do the Unicode conversions myself, it seems to work.

    --
    perl -pew "s/\b;([mnst])/'$1/g"

      Well, for my case it would mean I would have to do my own UTF-8 conversion -- which would mean to reimplement a lot of code that's already there with later Perl versions... sounds a bit silly. But I guess it depends on your application.