in reply to Re^2: UTF-8 or iso-8859-1 input to CGI.pm
in thread UTF-8 or iso-8859-1 input to CGI.pm

The RFC specifies how both client/server should behave, and all information is in HTTP headers.
  • Comment on Re^3: UTF-8 or iso-8859-1 input to CGI.pm

Replies are listed 'Best First'.
Re^4: UTF-8 or iso-8859-1 input to CGI.pm
by wol (Hermit) on Mar 02, 2009 at 12:34 UTC
    Judging by Moritz' reply below, and both the HTTP and HTML links referenced, this is incorrect.

    The 'Accept-Charset' HTTP header is useful when the client (ie browser) sends a request to the server, but it's not in the list of headers that are meaningful to the response (ie the content your CGI script sends back to the browser). You could include it anyway, but the browser will almost certainly ignore it, and because it's not in the HTTP standard, you run the risk of all sorts of interesting compatibility problems if some browsers do give it some proprietry meaning.

    See http://www.w3.org/Protocols/rfc2616/rfc2616-sec6.html#sec6.2 for the list of headers that are meaningful in a HTTP response.

    Then follow Moritz' advice. :-)

    Update: There is a way for the browser to indicate which character encoding it used when it POSTed the form data to the server: it's a part of the multipart/mime specification (usable in the body of the HTTP request). See http://www.faqs.org/rfcs/rfc1521.html, section 7.1.1. Unfortunately, I think this data is optional, and apart from that I don't know how you'd get access to that information in your CGI script anyway! Any other monks care to help on that point?

    --
    use JAPH;
    print JAPH::asString();

      Browser is supposed to send whatever server expects, or the default (iso-8859-1), nothing else.