in reply to Unicode and Forms

I you're sure you're always getting UTF-8 from the browsers, then everything should be hunky-dory after using Encode::_utf8_on.

However, I've found that some browsers can get confused about the character encoding the HTML is in, and consequently with which encoding data should be sent to the server.

I've found that if the Content-Type: output header has the string "; charset=utf-8" postfixed, then things tend to work out. So, for text/html, the content-type would then read:

Content-Type: text/html; charset=utf-8

Hope this helps.

Liz

Replies are listed 'Best First'.
Re: Re: Unicode and Forms
by graff (Chancellor) on Dec 15, 2003 at 03:00 UTC
    ... everything should be hunky-dory after using Encode::_utf8_on.

    ... unless of course something isn't hunky-dory to begin with. To quote the Encode man page:

    Messing with Perl's Internals

    The following API uses parts of Perl's internals in the current implementation. As such, they are efficient but may change.
    ...
    _utf8_on(STRING)
    INTERNAL Turns on the UTF-8 flag in STRING. The data in STRING is not checked for being well-formed UTF-8. Do not use unless you know that the STRING is well-formed UTF-8. Returns the previous state of the UTF-8 flag (so please don't treat the return value as indicat­ing success or failure), or "undef" if STRING is not a string.

    (emphasis in the original). If there's any chance that the incoming data might really not be proper utf8, then just treating it as if it were utf8 won't help.

    The safer, more stable (non-internal) method for "upgrading" a string to utf8 is covered in the Encode man page above the section quoted here, as is the part about "Handling Malformed Data", which might be relevant to the OP.