Re: UTF-8 or iso-8859-1 input to CGI.pm

As far as I know there's no reliable way to determine the encoding that was used.

What you can do however is to pick one encoding, say UTF-8, and consequently apply it to everything:

Comment on Re: UTF-8 or iso-8859-1 input to CGI.pm Download Code

Replies are listed 'Best First'.
Re^2: UTF-8 or iso-8859-1 input to CGI.pm by locked_user sundialsvc4 (Abbot) on Mar 02, 2009 at 12:58 UTC
I'm not readily seeing how the character-set used to encode the request is supposed to be an issue. As far as I know, it's UTF-8, with HTML-escapes being used to encode all of the special-characters that you might need. The browser, then, informs the server what character-sets it will accept, whereupon the server either delivers the information as-promised, or informs the client that it cannot be done. I do not profess to be a wizard on this one.
Re^3: UTF-8 or iso-8859-1 input to CGI.pm by moritz (Cardinal) on Mar 02, 2009 at 13:14 UTC
The request itself is always ASCII - that's the spec. However the URL-Encoding scheme with `%DE%AD%BE%EF` encodes only bytes, so you need to pick a character encoding. When I used latin-1 for this encoding some browsers sent me some requests with latin-1 encoded data, even though the pages themselves were encoded in UTF-8 (and declared as such). So I guess they decoded the URLs and did some encoding guesswork, and used that for GET requests. Which is why I recommend consistency ;-)	[reply] [d/l]