in reply to Re^2: CGI hidden params vs. character encoding
in thread CGI hidden params vs. character encoding

it worked. How strange...

I found it strange too. I just clued in what the error is.

First of all,

binmode STDOUT, ':utf-8';

is a no-op, since there's no "utf-8" layer.

>perl -le"print binmode(STDERR, ':utf8')?1:0" 1 >perl -le"print binmode(STDERR, ':utf-8')?1:0" 0 >perl -le"print binmode(STDERR, ':encoding(utf8)')?1:0" 1 >perl -le"print binmode(STDERR, ':encoding(utf-8)')?1:0" 1

If we do it properly (:encoding(utf-8)) we end up with your orignal problem.

Your problem is that you are double-encoding! You're telling CGI to encode your data using UTF8 (-charset => 'utf-8') and then you encode it again using binmode STDOUT, ":utf8";.

The solution is to get rid of binmode completely and only use CGI's methods to output.

Replies are listed 'Best First'.
Re^4: CGI hidden params vs. character encoding
by graff (Chancellor) on May 28, 2008 at 00:41 UTC
    Your problem is that you are double-encoding! You're telling CGI to encode your data using UTF8 (-charset => 'utf-8') and then you encode it again using binmode STDOUT, ":utf8";.

    But... But... Then why did the double-encoding show up only in that one place?? If the behavior were consistent throughout, I would understand, but I still can't figure out how I got the particular behavior that I did.

    The solution is to get rid of binmode completely and only use CGI's methods to output.

    I'm not sure about that. If I comment out the "binmode STDOUT..." in the OP code (having fixed all other encoding specs to "UTF-8" as described), I get "Wide character in print" warnings showing up in the error log. Also, I don't think I should have to rely entirely on CGI methods for printing content.

      But... But... Then why did the double-encoding show up only in that one place??

      Because the rest were ASCII characters.

      use Encode qw( encode ); $str = '<p>foo</p>'; for (1..5) { print("$str\n"); $str = encode('UTF-8', $str); }
      <p>foo</p> <p>foo</p> <p>foo</p> <p>foo</p> <p>foo</p>

      I'm not sure about that. If I comment out the "binmode STDOUT..." in the OP code (having fixed all other encoding specs to "UTF-8" as described), I get "Wide character in print" warnings showing up in the error log

      ARGH! CGI doesn't seem to be encoding. What's -charset for, then!? I need to look into this more.

      By the way, <p/> makes no sense. <p/>text<p/>text means <p></p>text<p></p>text but you want <p>text</p><p>text</p> is what you want.