in reply to Re^2: question about Encode::decode('iso-8859-1', ...)
in thread question about Encode::decode('iso-8859-1', ...)

I think you were trying to say that it makes no sense to decode something that's already been decoded, but that's got nothing to do with whether it's a "perl-internal utf8" buffer or not.

Based on having seen the result of this snippet:

perl -MEncode -e '$x="\x{0432}"; $y=decode("utf8",$x)'
my intention was to say that when you pass that sort of string to Encode::decode(), you get a run-time error. I had assumed that "that sort of string" was most easily understood as one whose utf8 flag was already on.

You've shown that things are actually deeper and more complicated -- I added "is_utf8()" to your script, and confirmed that Encode::decode was working without croaking, with the input string's utf8 flag on as well as off.

This is a surprising effect of the utf8::upgrade/downgrade functions, and I'm glad to know about it, though it goes a bit beyond the scope of the OP (and most applications that involve encoding issues).

Replies are listed 'Best First'.
Re^4: question about Encode::decode('iso-8859-1', ...)
by ikegami (Patriarch) on Mar 08, 2009 at 04:53 UTC

    This is a surprising effect of the utf8::upgrade/downgrade functions

    What surprising effect? Their purpose is to convert a scalar's internal encoding, and I used them for that purpose.

    If it helps clear up some confusion, change

    utf8::downgrade my $bin_dn = $bin; # UTF8=0 utf8::upgrade my $bin_up = $bin; # UTF8=1

    to

    my $bin_dn = $bin; # UTF8=0 chop my $bin_up = $bin . "\x{2660}"; # UTF8=1

    Practical use for utf8::upgrade: Ensure "Unicode semantics" are used in regex matches. (But note that work is being done to remove such dependencies on this internal information.)

    Practical use for utf8::downgrade: Ensure a string is a string of bytes (only contains chars 0-255), such as in Encode::decode and in Net::SFTP::Foreign::write.