ryantate has asked for the wisdom of the Perl Monks concerning the following question:

I have one mod_perl2 app that outputs Latin-1 by default and another that outputs UTF-8 aka Unicode. I am sending the appropriate Content-Type headers, but how do I make sure Apache/mod_perl won't mangle my UTF-8 when it prints my text out to the browser?

When printing from perl to a file or STDOUT or CGI STDOUT, I am supposed to binmode the filehandle/STDOUT with a :utf8 layer. But if I do this under mod_perl I'll be binmoding STDOUT different for each application. Won't that royally screw things up? The docs for binmode say one should set it as soon as possible after opening the handle.

Besides, I'm not sure mod_perl2 even uses STDOUT. From what I understand it just intercepts print calls and handles them in some other way. So I'm not sure settign binmode on STDOUT has any relevance for mod_perl2.

I have had a frustrating experience trying to find answers on this. The mod_perl2 changefile notes "much better support for Unicode" but I could not find anything concrete in the docs, or if I did I did not undestand it. A search of mod_perl2 docs for Unicode turns up the changefile and a troubleshooting tip involving byte order marks.

I found a BINMODE in Apache2::RequestIO but it is part of the "TIE interface," which does not sound like what I am looking for, and it says "NoOP", which does not sound promising anyway.

Not much luck with Google or Usenet searches either.

Any help is much appreciated.

Replies are listed 'Best First'.
Re: Unicode and mod_perl 2
by randyk (Parson) on Mar 12, 2006 at 20:11 UTC
      Thanks, this is quite useful. From reading the comments in the tests you linked to, it looks like Stas & Co have made binmode 'Just Work', and I am going to guess the binmode only affects the current request object, so not a big deal to have different binmode for different apps.

      Doing $r->print($utf8_content) apparently just works, too (based on the comments in the linked tests).

      Also, having re-read the binmode docs, I see it is in any explicitly OK to reset the layer while the handle is in progress, in any case.

      Thanks!

Re: Unicode and mod_perl 2
by spiritway (Vicar) on Mar 12, 2006 at 20:39 UTC
      Thanks, I've read the first two and am familiar with setting up apache charsets. My question is specifically about getting a known utf8 string printed through mod_perl, due to aforementioned differences with STDOUT. Thanks nevertheless.