in reply to "use encoding" behaviour change under Perl 5.10?

I wish I could answer your question, but all I can do is point out that comparing 5.8.8 and 5.10 in terms of their respective "encoding.pm" and "utf8.pm" files doesn't seem to help. The only real diffs (apart from changes in white-space usage) involve the POD, and those diffs don't really shed any light on this issue.

In one respect, it may be that 5.10's behavior is "consistent" in a way that 5.8's behavior is not:

# in perl 5.8.8: perl -Mencoding=utf8 -e '$a="\x51";$b="\xE1"; printf( "a=%x b=%x\n",or +d($a),ord($b))' a=51 b=fffd perl -Mencoding=utf8 -e '$a=chr(hex("51"));$b=chr(hex("E1")); printf( +"a=%x b=%x\n",ord($a),ord($b))' a=51 b=e1
In 5.10, those two commands both produce "b=fffd". But knowing that probably doesn't help either (sorry).

While it's probably true that maintaining your own local version of CGI::Util will be "easier", it might still make sense to consider whether there's a better way to handle this issue:

I cannot easily remove the 'use encoding "utf8"' from my script as the pragma does "some magic" that prevents encoding-related disasters (double-encoded strings) further down the road.

Figuring what that "magic" is and where it applies in your code, and then looking for better ways to achieve the same result, might be better in the long run. In particular, since "use encoding" is not scoped, replacing that solution with some other (scoped and/or focused) approach for your encoding-related disasters would seem prudent and worthwhile.

Replies are listed 'Best First'.
Re^2: "use encoding" behaviour change under Perl 5.10?
by gnosek (Sexton) on Mar 21, 2009 at 17:54 UTC

    Thanks for your example. I understand that "use encoding" applies to the program source (string literals etc.) but that it affects the chr function was quite a surprise to me.

    As for replacing it with something more manageable, I certainly will, but I needed a workaround right now and could not afford to dig into the code at the moment.

      From perldoc perlunicode:

      The "chr()" and "ord()" functions work on characters, similar to "pack("W")" and "unpack("W")", not "pack("C")" and "unpack("C")". "pack("C")" and "unpack("C")" are methods for emulating byte-oriented "chr()" and "ord()" on Unicode strings. While these methods reveal the internal encoding of Unicode strings, that is not something one normally needs to care about at all.

      Being a no-expert at all in that, just hope the following can help to give you or others a good direction:

      perl -Mencoding=utf8 -le 'print unpack "C", chr 156'
      156
      
      perl -Mencoding=utf8 -M'Encode qw(from_to)' -le '$c = chr 156; from_to($c, "iso-8859-3", "utf-8"); print ord $c'
      156