Re: Unicode and You

Yes, I second your comments. Besides working correctly, perl 5.8 also adds a couple of nifty features. For example, it lets you conveniently set the input and output character sets for a filehandle and will take care of all the necessary encoding for you. And it adds more alphabet/character classes for regexps.

One thing I disliked about 5.6.1 was that it was impossible to tell it that I want my in- and output as UTF-8. In some situations, it kept on treating my UTF-8-encoded input as raw 8-bit characters and tried to encode them as UTF-8 *again* when printing them to STDOUT... While I could solve my problems, it took me a while to work around it.

perl 5.6.0 was worse. I did a simple module to convert Chinese traditional characters to simplified ones (Yes, I know there are two on CPAN already, but I had a good reason to do so), using a conversion table in a hash. For whatever reason, 5.6.0 would produce malformed characters, but only in some cases -- on 5.6.1 it works fine.

Now the only problem I'm facing is... writing my scripts so they will work well (or fail gracefully) with 5.8.0, 5.6.1, 5.6.0 etc...

Comment on Re: Unicode and You

Replies are listed 'Best First'.
Re: Re: Unicode and You by belg4mit (Prior) on Aug 19, 2002 at 18:11 UTC
>Now the only problem I'm facing is... writing my scripts so they will work well (or fail gracefully) with 5.8.0, 5.6.1, >5.6.0 etc... That's actually what I'm working on, only I'm keeping the span open for /5\.00\d/ Although I only have to handle input, not output. My solution has been to handle the raw bytes and do the Unicode conversions myself, it seems to work. `-- perl -pew "s/\b;([mnst])/'$1/g"`	[reply]
Re: Unicode and You by crenz (Priest) on Aug 20, 2002 at 14:34 UTC
Well, for my case it would mean I would have to do my own UTF-8 conversion -- which would mean to reimplement a lot of code that's already there with later Perl versions... sounds a bit silly. But I guess it depends on your application.	[reply]