in reply to Re: UTF-8 or iso-8859-1 input to CGI.pm
in thread UTF-8 or iso-8859-1 input to CGI.pm

I'm not readily seeing how the character-set used to encode the request is supposed to be an issue. As far as I know, it's UTF-8, with HTML-escapes being used to encode all of the special-characters that you might need.

The browser, then, informs the server what character-sets it will accept, whereupon the server either delivers the information as-promised, or informs the client that it cannot be done.

I do not profess to be a wizard on this one.

Replies are listed 'Best First'.
Re^3: UTF-8 or iso-8859-1 input to CGI.pm
by moritz (Cardinal) on Mar 02, 2009 at 13:14 UTC
    The request itself is always ASCII - that's the spec.

    However the URL-Encoding scheme with %DE%AD%BE%EF encodes only bytes, so you need to pick a character encoding.

    When I used latin-1 for this encoding some browsers sent me some requests with latin-1 encoded data, even though the pages themselves were encoded in UTF-8 (and declared as such). So I guess they decoded the URLs and did some encoding guesswork, and used that for GET requests.

    Which is why I recommend consistency ;-)