in reply to Getting mad with CGI::Application and utf8

Now - occasionally! (which it also does under simple cgi not fast cgi operation and seems to be connected to overall load) - my upload crashes with CGI.pm (version 3.33) throwing "Malformed utf8" in apache's error.log

What does "occasionally" mean? Does it alwyays die for the same set of data?

And finally: does the data contain malfromed UTF8?

Replies are listed 'Best First'.
Re^2: Getting mad with CGI::Application and utf8
by isync (Hermit) on Feb 26, 2008 at 11:51 UTC
    At first, thanks for your reply!

    Did you check if CGI::Fast returns text strings (i.e. with UTF-8 flag set)?
    No, as I don't know spontaneously how.

    use open ':utf8' only affects open (iirc), so it's useless you unless open files.
    Ok. But, mh, what about caches, like File::Cache? Should I turn :utf8 for open on when I use these or will the module handle it internally?

    Do you use binmode STDOUT, ':utf8';?
    I did, actually I used binmode STDOUT, ":encoding(utf8)";. But then it seems to break CGI-Application-Plugin-CompressGzip! (more hair pulling..) Should I set it again? And, should I also set again binmode STDIN, ":encoding(utf8)" (or is this redundant with my my $param_f    = decode("utf8", $q->param("f") ) procedure??

    Don't use Encode::_utf8_on($flagOn); - it's an internal method of the Encode module, and shouldn't be called from the outside. Use <c>$string = decode_utf8 $string; instead.
    I know. But I inspected the returned strings (local $Data::Dumper::Useqq = 1;) and found out they were properly formed utf8. Until they pass trought the final stages of CGI::Application, which broke them again. So I tried various solutions and found out that yes, the string was proper utf8 but without the flag on. When I switched it on manually, C::A left them alone and seem to pass it till the browser stage. (you see, I am deeply woven in trouble...)

    update:I now use decode_utf8 and it works just as good.

    Regarding testdata
    I tested with a 30K tar.gz, a 1K perl script and a 500K mp3 file - each the same problem, they sometimes come through..
      You can check if the UTF-8 flag is set with Devel::Peek, which can dump all the internal flags of a scalar varaible.

      There is also utf8::is_utf8, but I somehow suspect that the results might be subtly different (not sure yet, haven't really tried. Perhaps the difference that I thought I had noticed came from somewhere else.)

        I will have a look, but I suspect the problem is not in CGI::Fast. Any comments on my other remarks?

        There is also utf8::is_utf8, but I somehow suspect that the results might be subtly different [from what Devel::Peek reports]

        They're not. However, is_utf8 is to be avoided because it's too easy to use it when you shouldn't be doing that. In generaly, you should not be looking at the state of the UTF8 flag unless you're a Perl developer, or wish to learn about Perl's guts. In general, learn about the IOK, NOK, and POK flags first, and then treat the UTF8 flag as if it was called UOK.

Re^2: Getting mad with CGI::Application and utf8
by Juerd (Abbot) on Feb 26, 2008 at 20:41 UTC

    Did you check if CGI::Fast returns text strings (i.e. with UTF-8 flag set)?

    Please note that the absense of the UTF8 flag does not mean that the string is not a text string.

    Checking if the UTF8 flag is set, should be done only by people who know about Perl's Unicode internals. To all other people, it will only add to the confusion.

    Properly decode and encode, and all will be fine. (Though you may need an occasional utf8::upgrade, see also Unicode::Semantics.)