Re^2: CGI hidden params vs. character encoding

First of all, decode( 'utf8', $untrusted ) is a security issue.

Wouldn't that depend on what you do with the value that you get back from decode()? Also, what would be the remedy? I would expect it's okay to do something like eval { decode( 'UTF-8', $untrusted, Encode::FB_CROAK ) } and check $@, or maybe just pass the return value from decode() through a regex or other test for valid content.

Secondly, UTF8 is a perl-specific encoding. UTF-8 is the actual encoding.
I haven't pinpointed the problem, but changing UTF8 to UTF-8 throughout fixed the problem.

Okay... I had to try twice -- I didn't get all the "utf8" strings changed over to "UTF-8" on the first try, but after I fixed the one I had forgotten ("binmode STDOUT..."), it worked. How strange...

Thanks!!!

Comment on Re^2: CGI hidden params vs. character encoding Download Code

Replies are listed 'Best First'.
Re^3: CGI hidden params vs. character encoding by ikegami (Patriarch) on May 27, 2008 at 23:31 UTC
it worked. How strange... I found it strange too. I just clued in what the error is. First of all, `binmode STDOUT, ':utf-8';` [download] is a no-op, since there's no "utf-8" layer. `>perl -le"print binmode(STDERR, ':utf8')?1:0" 1 >perl -le"print binmode(STDERR, ':utf-8')?1:0" 0 >perl -le"print binmode(STDERR, ':encoding(utf8)')?1:0" 1 >perl -le"print binmode(STDERR, ':encoding(utf-8)')?1:0" 1` [download] If we do it properly (`:encoding(utf-8)`) we end up with your orignal problem. Your problem is that you are double-encoding! You're telling CGI to encode your data using UTF8 (`-charset => 'utf-8'`) and then you encode it again using `binmode STDOUT, ":utf8";`. The solution is to get rid of `binmode` completely and only use CGI's methods to output.	[reply] [d/l] [select]
Re^4: CGI hidden params vs. character encoding by graff (Chancellor) on May 28, 2008 at 00:41 UTC
Your problem is that you are double-encoding! You're telling CGI to encode your data using UTF8 (-charset => 'utf-8') and then you encode it again using binmode STDOUT, ":utf8";. But... But... Then why did the double-encoding show up only in that one place?? If the behavior were consistent throughout, I would understand, but I still can't figure out how I got the particular behavior that I did. The solution is to get rid of binmode completely and only use CGI's methods to output. I'm not sure about that. If I comment out the "binmode STDOUT..." in the OP code (having fixed all other encoding specs to "UTF-8" as described), I get "Wide character in print" warnings showing up in the error log. Also, I don't think I should have to rely entirely on CGI methods for printing content.	[reply]
Re^5: CGI hidden params vs. character encoding by ikegami (Patriarch) on May 28, 2008 at 01:24 UTC
But... But... Then why did the double-encoding show up only in that one place?? Because the rest were ASCII characters. `use Encode qw( encode ); $str = '<p>foo</p>'; for (1..5) { print("$str\n"); $str = encode('UTF-8', $str); }` [download] `<p>foo</p> <p>foo</p> <p>foo</p> <p>foo</p> <p>foo</p>` [download] I'm not sure about that. If I comment out the "binmode STDOUT..." in the OP code (having fixed all other encoding specs to "UTF-8" as described), I get "Wide character in print" warnings showing up in the error log ARGH! CGI doesn't seem to be encoding. What's `-charset` for, then!? I need to look into this more. By the way, `<p/>` makes no sense. `<p/>text<p/>text` means `<p></p>text<p></p>text` but you want `<p>text</p><p>text</p>` is what you want.	[reply] [d/l] [select]