in reply to UTF-8 or iso-8859-1 input to CGI.pm

As far as I know there's no reliable way to determine the encoding that was used.

What you can do however is to pick one encoding, say UTF-8, and consequently apply it to everything:

Replies are listed 'Best First'.
Re^2: UTF-8 or iso-8859-1 input to CGI.pm
by locked_user sundialsvc4 (Abbot) on Mar 02, 2009 at 12:58 UTC

    I'm not readily seeing how the character-set used to encode the request is supposed to be an issue. As far as I know, it's UTF-8, with HTML-escapes being used to encode all of the special-characters that you might need.

    The browser, then, informs the server what character-sets it will accept, whereupon the server either delivers the information as-promised, or informs the client that it cannot be done.

    I do not profess to be a wizard on this one.

      The request itself is always ASCII - that's the spec.

      However the URL-Encoding scheme with %DE%AD%BE%EF encodes only bytes, so you need to pick a character encoding.

      When I used latin-1 for this encoding some browsers sent me some requests with latin-1 encoded data, even though the pages themselves were encoded in UTF-8 (and declared as such). So I guess they decoded the URLs and did some encoding guesswork, and used that for GET requests.

      Which is why I recommend consistency ;-)