in reply to Re: Getting mad with CGI::Application and utf8
in thread Getting mad with CGI::Application and utf8

At first, thanks for your reply!

Did you check if CGI::Fast returns text strings (i.e. with UTF-8 flag set)?
No, as I don't know spontaneously how.

use open ':utf8' only affects open (iirc), so it's useless you unless open files.
Ok. But, mh, what about caches, like File::Cache? Should I turn :utf8 for open on when I use these or will the module handle it internally?

Do you use binmode STDOUT, ':utf8';?
I did, actually I used binmode STDOUT, ":encoding(utf8)";. But then it seems to break CGI-Application-Plugin-CompressGzip! (more hair pulling..) Should I set it again? And, should I also set again binmode STDIN, ":encoding(utf8)" (or is this redundant with my my $param_f    = decode("utf8", $q->param("f") ) procedure??

Don't use Encode::_utf8_on($flagOn); - it's an internal method of the Encode module, and shouldn't be called from the outside. Use <c>$string = decode_utf8 $string; instead.
I know. But I inspected the returned strings (local $Data::Dumper::Useqq = 1;) and found out they were properly formed utf8. Until they pass trought the final stages of CGI::Application, which broke them again. So I tried various solutions and found out that yes, the string was proper utf8 but without the flag on. When I switched it on manually, C::A left them alone and seem to pass it till the browser stage. (you see, I am deeply woven in trouble...)

update:I now use decode_utf8 and it works just as good.

Regarding testdata
I tested with a 30K tar.gz, a 1K perl script and a 500K mp3 file - each the same problem, they sometimes come through..

Replies are listed 'Best First'.
Re^3: Getting mad with CGI::Application and utf8
by moritz (Cardinal) on Feb 26, 2008 at 12:28 UTC
    You can check if the UTF-8 flag is set with Devel::Peek, which can dump all the internal flags of a scalar varaible.

    There is also utf8::is_utf8, but I somehow suspect that the results might be subtly different (not sure yet, haven't really tried. Perhaps the difference that I thought I had noticed came from somewhere else.)

      I will have a look, but I suspect the problem is not in CGI::Fast. Any comments on my other remarks?

      There is also utf8::is_utf8, but I somehow suspect that the results might be subtly different [from what Devel::Peek reports]

      They're not. However, is_utf8 is to be avoided because it's too easy to use it when you shouldn't be doing that. In generaly, you should not be looking at the state of the UTF8 flag unless you're a Perl developer, or wish to learn about Perl's guts. In general, learn about the IOK, NOK, and POK flags first, and then treat the UTF8 flag as if it was called UOK.

        They're not. However, is_utf8 is to be avoided because it's too easy to use it when you shouldn't be doing that. In generaly, you should not be looking at the state of the UTF8 flag unless you're a Perl developer, or wish to learn about Perl's guts.

        So as the average John Doe Perl hacker, what should I use to find out if a certain module or sub returns text strings or binary strings?

        Very often that's only poorly documented, or not at all, and I don't think that "reading the source code" is a good advice either.