in reply to Encoding confusion with CGI forms

What charset are you setting on the Content-Type header? Browsers should use that charset for form submissions. In my experience, UTF-8 works well. As does windows-1252, which is the iso-8859-1 used by Windows.

There is an accept-charset attribute on the form tag but it is not well supported. Also, with POSTs browsers should include a charset in the HTTP request header but that is also not well supported.

This looks like a good primer:

Replies are listed 'Best First'.
Re^2: Encoding confusion with CGI forms
by davistv (Acolyte) on Oct 22, 2004 at 20:59 UTC
    My content-type header looks like:
    <meta http-equiv="content-type" content="text/html; charset=utf-8">

    The page, script and Apache configuration are all set to UTF-8 right now, but for instance, when I paste text from MS Word into the form, it strips out all of the hi-bit characters. I just posted the source for the script to this thread, please take a look.

    And I'll defnitely read that page on i18n form usage, thank you for the link. I'm not sure if it will fix the problem I'm dealing with or not, though. The biggest problem is that I can't count on the page encoding dictating the string encoding I get back, so I'm trying to detect it with a hidden form variable instead. The accept-charset attribute just doesn't seem to work with my client's machines.

    I might have to try the windows-1252 charset option you mention. I'm not sure if that'll fix all of the broken behavior either, though. I've still got to escape the text to Latin1 with HTML entities before inserting into MySQL (no unicode support).

    Thank You,