Re^3: question about Encode::decode('iso-8859-1', ...)

I think you were trying to say that it makes no sense to decode something that's already been decoded, but that's got nothing to do with whether it's a "perl-internal utf8" buffer or not.

Based on having seen the result of this snippet:

perl -MEncode -e '$x="\x{0432}"; $y=decode("utf8",$x)'
[download]

my intention was to say that when you pass that sort of string to Encode::decode(), you get a run-time error. I had assumed that "that sort of string" was most easily understood as one whose utf8 flag was already on.

You've shown that things are actually deeper and more complicated -- I added "is_utf8()" to your script, and confirmed that Encode::decode was working without croaking, with the input string's utf8 flag on as well as off.

This is a surprising effect of the utf8::upgrade/downgrade functions, and I'm glad to know about it, though it goes a bit beyond the scope of the OP (and most applications that involve encoding issues).

Comment on Re^3: question about Encode::decode('iso-8859-1', ...) Download Code

Replies are listed 'Best First'.
Re^4: question about Encode::decode('iso-8859-1', ...) by ikegami (Patriarch) on Mar 08, 2009 at 04:53 UTC
This is a surprising effect of the utf8::upgrade/downgrade functions What surprising effect? Their purpose is to convert a scalar's internal encoding, and I used them for that purpose. If it helps clear up some confusion, change `utf8::downgrade my $bin_dn = $bin; # UTF8=0 utf8::upgrade my $bin_up = $bin; # UTF8=1` [download] to `my $bin_dn = $bin; # UTF8=0 chop my $bin_up = $bin . "\x{2660}"; # UTF8=1` [download] Practical use for `utf8::upgrade`: Ensure "Unicode semantics" are used in regex matches. (But note that work is being done to remove such dependencies on this internal information.) Practical use for `utf8::downgrade`: Ensure a string is a string of bytes (only contains chars 0-255), such as in `Encode::decode` and in `Net::SFTP::Foreign::write`.	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^4: question about Encode::decode('iso-8859-1', ...)
by ikegami (Patriarch) on Mar 08, 2009 at 04:53 UTC

This is a surprising effect of the utf8::upgrade/downgrade functions

What surprising effect? Their purpose is to convert a scalar's internal encoding, and I used them for that purpose.

If it helps clear up some confusion, change

   utf8::downgrade my $bin_dn = $bin;  # UTF8=0
   utf8::upgrade   my $bin_up = $bin;  # UTF8=1
[download]

   my $bin_dn = $bin;                    # UTF8=0
   chop my $bin_up = $bin . "\x{2660}";  # UTF8=1
[download]

Practical use for utf8::upgrade: Ensure "Unicode semantics" are used in regex matches. (But note that work is being done to remove such dependencies on this internal information.)

Practical use for utf8::downgrade: Ensure a string is a string of bytes (only contains chars 0-255), such as in Encode::decode and in Net::SFTP::Foreign::write.

[reply]
[d/l]
[select]