samtregar has asked for the wisdom of the Perl Monks concerning the following question:

Hello all. I'm maintaining a Perl app which relies on a possibly incorrect behavior in Encode::decode_utf8 - references are expected to pass through unmangled. This worked fine until a recent upgrade. Observe Encode v2.08 with Perl v5.6.1:

$ perl -MEncode -MData::Dumper -e \ 'my $ref = Encode::decode_utf8({ foo => 1}); print Dumper($ref); +' $VAR1 = { 'foo' => 1 };

However, Encode v2.18 with Perl v5.6.1 is not so forgiving:

$ perl -MEncode -MData::Dumper -e \ 'my $ref = Encode::decode_utf8({ foo => 1}); print Dumper($ref); +' $VAR1 = 'HASH(0x9932180)';

So, is this a bug in Encode or a bug in my app? I'm leaning towards the latter but I thought I'd check with you before I started trying to fix it (no tests, argh!). Aside from "don't do that", can you suggest a fix?

Thanks,
-sam

PS: I also submitted this to the perl-unicode mailing-list. I'll update anything useful I get there in this node.

Replies are listed 'Best First'.
Re: Encode::decode_utf8 and references
by rhesa (Vicar) on Jun 17, 2006 at 21:55 UTC
    I noticed the same thing recently, under Perl v5.8.8.

    I use decode_utf8 in a custom override of CGI's param() method, to give me properly flagged unicode strings. When I installed Encode 2.08, file uploads broke, because the incoming file handles got mangled. I fixed this issue inside my custom param() method, but I suppose you could also brute-force the issue by overriding decode_utf8:

    { no warnings 'redefine'; my $decode_utf8_orig = \&Encode::decode_utf8; *Encode::decode_utf8 = sub { return $_[0] if ref $_[0]; # avoid mangling references. goto &$decode_utf8_orig; # original behavior for normal +scalars. } }
Re: Encode::decode_utf8 and references
by Anonymous Monk on Jun 17, 2006 at 21:51 UTC
    use strict; no warnings 'redefine'; use Encode; use Data::Dumper; our $REAL; BEGIN { $REAL = \&Encode::decode_utf8; } sub Encode::decode_utf8 ($;$) { my $hash = shift; for (keys %{$hash}) { $hash->{$_} = $REAL->($hash->{$_}) } return $hash; } my $ref = Encode::decode_utf8({ foo => 1}); print Dumper($ref);
      Mmmm, yeah, that might work. Of course I'd have to deal with the more common case where the value passed isn't a hash-ref.

      -sam

Re: Encode::decode_utf8 and references
by graff (Chancellor) on Jun 18, 2006 at 01:16 UTC
    Um, just curious about this, but... what is the point (the purpose, benefit, short-cut or whatever) of passing a hashref to Encode::decode_utf8()?

    If you were actually expecting to set the utf8 flags on the keys and values of the hash that is being passed by reference, that would simply be a misconception in your code. (But I gather that was not the intention.)

    If it's just a situation that you are doing something with a list of who-knows-what, and you're just passing every element of the list to decode_utf8, regardless of what it might be, well, that seems a bit silly. You didn't show an actual snippet from the application in question, but I think it might be better for you, if your code looks something like this:

    @olist = map { decode_utf8( $_ ) } @ilist;
    to just add a little bit to make it sensible, like this:
    @olist = map { (ref() ? $_ : decode_utf8( $_ ) } @ilist;
    Given what you've said, I'd suggest that you treat the problem as a bug in your app -- functions intended for strings should not be applied to references. (You wouldn't pass a hashref to substr() or index(), would you?)
      Um, just curious about this, but... what is the point (the purpose, benefit, short-cut or whatever) of passing a hashref to Encode::decode_utf8()?

      Your guess is as good as mine. I didn't write this code - I just maintain it. If there's no easy way to restore the behavior of prior versions then I'll probably end up either overriding it (ala the other suggestions) or auditing the code for bad calls to decode_utf8().

      (You wouldn't pass a hashref to substr() or index(), would you?)

      No, I wouldn't. But to be fair to the original coder, this did work for the entire time he was on the project. I doubt that any of us write code that's safe from all possible changes that could be justified in the future!

      -sam

Re: Encode::decode_utf8 and references
by Anonymous Monk on Jun 18, 2006 at 07:41 UTC
    This module requires perl5.7.3 or later.
      Huh - you're right. I meant to type 5.8.6!

      -sam