Re^4: Confusing UTF-8 bug in CGI-script

it doesn't do any harm in practice

I don't know how you can say that after saying yourself that removing it also fixes the OP's problem.

Update: Well, you said that removing use open fixes the issue, but I doubt you're claiming that binmoding output handles leads to a decoding error, so that leaves the binmoding of the input handle.

Comment on Re^4: Confusing UTF-8 bug in CGI-script Download Code

Replies are listed 'Best First'.
Re^5: Confusing UTF-8 bug in CGI-script by Anonyrnous Monk (Hermit) on Feb 01, 2011 at 19:25 UTC
How can you say that? You said yourself... If you re-read carefully what I said, you'll see that I said the script works both as is and when I remove those `use` statements (except for the special case I mentioned in the correction).	[reply] [d/l]
Re^6: Confusing UTF-8 bug in CGI-script by ikegami (Patriarch) on Feb 01, 2011 at 20:15 UTC
ok, I follow. Note that even if it seems to work, that doesn't make what you say correct. Depending on how the client encodes the request, the OP's code will work. That doesn't make it right. I suspect one of two reasons for the differences: Your client encoded the request such that the initial decoding is a no-op (e.g. it %-encodes every byte with the 8th bit set). You won't be so lucky with a different client. Maybe your version of `decode` silently does nothing (instead of dieing) when it guesses a double-decode is being attempted. If so, that will make the OP's code work for all but unlikely inputs. But that means you're relying on `decode` to catch your bug. Update: Cleaned up. Replaced first paragraph. (It was "Gotcha.", which is ambiguous.)	[reply] [d/l] [select]
Re^7: Confusing UTF-8 bug in CGI-script by Anonyrnous Monk (Hermit) on Feb 01, 2011 at 21:59 UTC
I suspect one of two reasons for the differences: ... Neither of those is the case. I've verified (by sending the request through a proxy) that the content is sent UTF-8 encoded (i.e. no %-encoding). And my Encode::decode also behaves normally (i.e. it would die with "Cannot decode string with wide characters"). This is, however, irrelevant, because CGI.pm has code to prevent double-decoding: `sub _decode_utf8 { my ($self, $val) = @_; if (Encode::is_utf8($val)) { return $val; } else { return Encode::decode(utf8 => $val); } }` [download] This sufficiently explains the behavior I observed and reported (for the input side).	[reply] [d/l]
Re^8: Confusing UTF-8 bug in CGI-script by ikegami (Patriarch) on Feb 02, 2011 at 01:51 UTC