Re: UTF-8 or iso-8859-1 input to CGI.pm

Replies are listed 'Best First'.
Re^2: UTF-8 or iso-8859-1 input to CGI.pm by thedi (Acolyte) on Mar 02, 2009 at 09:59 UTC
This document describes how a browser can request a document in a desired code. This is how a CGI script should encode its response. But I am looking for the encoding of the request. How can a CGI script find out in which encoding a request was send from the browser to the server. This is: how is the CGI input encoded. This is of importance when a request contains form input. Is this form input send utf-8 encoded or iso-8859-1. Rather: how can a CGI.pm based script find out hoe the input was encoded. Thanks Thedi gerber@id.ethz.ch	[reply]
Re^3: UTF-8 or iso-8859-1 input to CGI.pm by Anonymous Monk on Mar 02, 2009 at 10:24 UTC
The RFC specifies how both client/server should behave, and all information is in HTTP headers.	[reply]
Re^4: UTF-8 or iso-8859-1 input to CGI.pm by wol (Hermit) on Mar 02, 2009 at 12:34 UTC
Judging by Moritz' reply below, and both the HTTP and HTML links referenced, this is incorrect. The 'Accept-Charset' HTTP header is useful when the client (ie browser) sends a request to the server, but it's not in the list of headers that are meaningful to the response (ie the content your CGI script sends back to the browser). You could include it anyway, but the browser will almost certainly ignore it, and because it's not in the HTTP standard, you run the risk of all sorts of interesting compatibility problems if some browsers do give it some proprietry meaning. See http://www.w3.org/Protocols/rfc2616/rfc2616-sec6.html#sec6.2 for the list of headers that are meaningful in a HTTP response. Then follow Moritz' advice. :-) Update: There is a way for the browser to indicate which character encoding it used when it POSTed the form data to the server: it's a part of the multipart/mime specification (usable in the body of the HTTP request). See http://www.faqs.org/rfcs/rfc1521.html, section 7.1.1. Unfortunately, I think this data is optional, and apart from that I don't know how you'd get access to that information in your CGI script anyway! Any other monks care to help on that point? -- use JAPH; print JAPH::asString();	[reply]
Re^5: UTF-8 or iso-8859-1 input to CGI.pm by Anonymous Monk on Mar 03, 2009 at 10:24 UTC