Re: Confusing UTF-8 bug in CGI-script

Not sure if it helps, but your script works fine for me. I.e., when I drop it into the cgi-bin directory of an Apache server, view the page in Firefox, and enter some non-ASCII content into the textarea input field (e.g. by cut-n-pasting something from a Chinese web page), the data is echoed back from the script without an encoding problem. And if I trace the traffic between browser and web server, it's encoded as UTF-8, as expected.

The script also works when I comment out the lines use locale; and use open ':std' => ':encoding(UTF-8)'; (which are superfluous at best, IMHO). And the use utf8; line is of course only required if the script source itself is in fact encoded in UTF-8 (the literal characters "öäüõžš¢ð€¶", in this case).

(tested with various versions of CGI.pm from 3.04 to 3.49 — Update: 3.04, 3.15, 3.29, 3.48 and 3.49, to be precise)

Correction: with CGI-3.49/Perl-5.12.2, STDOUT needs to be explicitly declared as UTF-8 (either with binmode STDOUT, ":utf8", or with use open...), otherwise I'm getting warnings "Wide character in print" in the error log. This is not the case with earlier versions.

Comment on Re: Confusing UTF-8 bug in CGI-script Select or Download Code

Replies are listed 'Best First'.
Re^2: Confusing UTF-8 bug in CGI-script by ikegami (Patriarch) on Feb 01, 2011 at 18:40 UTC
The script also works when I comment out the lines `use locale;` and `use open ':std' => ':encoding(UTF-8)';` (which are superfluous at best, IMHO). «`use locale;`» is indeed superfluous since he doesn't do any operations that uses locales (`cmp`, `lc`, etc). It's not relevant to the OP's question since it doesn't affect encoding. «`use open ':std' => ':encoding(UTF-8)';`» is not superfluous. Part of what it does is necessary, and the other part of what it does is wrong. Specifically, `BEGIN { # Wrong, and the cause of the OP's problem. See my reply to the OP. binmode(STDIN, ':encoding(UTF-8)'); # Necessary to encode the returned HTML. binmode(STDOUT, ':encoding(UTF-8)'); # Necessary to encode error messages for the log. binmode(STDERR, ':encoding(UTF-8)'); }` [download] It could be replaced with the following or something equivalent, but it shouldn't be eliminated. `BEGIN { binmode(STDIN); # Form data binmode(STDOUT, ':encoding(UTF-8)'); # HTML binmode(STDERR, ':encoding(UTF-8)'); # Error messages }` [download]	[reply] [d/l] [select]
Re^3: Confusing UTF-8 bug in CGI-script by Anonyrnous Monk (Hermit) on Feb 01, 2011 at 18:55 UTC
`# Wrong, and the cause of the OP's problem. See my reply to the OP. binmode(STDIN, ':encoding(UTF-8)');` [download] That's what I would've thought, too, but interestingly, it doesn't do any harm in practice (I did try it), and `# Necessary to encode the returned HTML. binmode(STDOUT, ':encoding(UTF-8)');` [download] only seems to be required with newer versions of CGI.pm (as I mentioned). Older versions apparently did the encoding themselves before printing to STDOUT (?)	[reply] [d/l] [select]
Re^4: Confusing UTF-8 bug in CGI-script by ikegami (Patriarch) on Feb 01, 2011 at 19:20 UTC
it doesn't do any harm in practice I don't know how you can say that after saying yourself that removing it also fixes the OP's problem. Update: Well, you said that removing `use open` fixes the issue, but I doubt you're claiming that binmoding output handles leads to a decoding error, so that leaves the binmoding of the input handle.	[reply] [d/l]
Re^5: Confusing UTF-8 bug in CGI-script by Anonyrnous Monk (Hermit) on Feb 01, 2011 at 19:25 UTC
Re^6: Confusing UTF-8 bug in CGI-script by ikegami (Patriarch) on Feb 01, 2011 at 20:15 UTC
Some notes below your chosen depth have not been shown here