qq has asked for the wisdom of the Perl Monks concerning the following question:

My company is about to create a site that will gather submissions from a very internationl audience. Each visitor will enter several paragraphs of text.

I'm worried because I don't really know anything about practical character encoding issues. I'd really like all data submitted by users to be utf-8. But can I enforce this? It seems that the accept-charset attribute of the form tag is not widely honoured.

I found a decent overview of the traps at http://ppewww.ph.gla.ac.uk/~flavell/charset/form-i18n.html. I particularly like the 'buzzword' idea, where a string in a hidden field is used to guess what encoding the form was sent it.

So - is there a standard solution to this problem? or several?

thanks, qq

Replies are listed 'Best First'.
Re: Text encoding in web form entries
by borisz (Canon) on Feb 19, 2004 at 23:28 UTC
    Send the page that includes the form in utf8 since some browsers write it back in the charset they recieved the data regardless what you tell them with accept-charset. Put a hidden field in the post request, that ensures you what charset you expect to get back . Use the Accept-charset attribute in your forms. check your hidden field and convert the recieved data back to utf8 if needed. Use Post requests.
    Boris

      ++ and thanks

      Could I still have issues when people copy and paste from MS apps into the form? Will I need to run the demoronzier over some submissions?

      qq

        I do not know, but the trick is to put information into a form that gives you the power to recognice what happened to the data after the user has add something. Then convert it to a known state and continue processing. That works very well. I inherite from Apache::Request, process all incomming POST requests if they include one of my keys otherwise pass them through.
        Boris