in reply to Text Encoding on this site's HTML

I'm not opposed to interim solutions, but we should be working towards using UTF-8 exclusively. The Latin-1 character set is OK for western languages but no good for eastern european or asian languages. Win-Latin-1 (CP1252) is a stupid hack. UTF-8 is inclusive and easy - as long as the tools support it.

Perhaps the input forms could offer a menu choice for the input encoding and everything could be converted to UTF-8 on input. Then all output could simply be sent as UTF-8, browsers have supported it for quite some time.

Replies are listed 'Best First'.
Re: Re: Text Encoding on this site's HTML
by theorbtwo (Prior) on Dec 24, 2002 at 06:26 UTC

    The correct thing to do is probably to look at what content-encoding headers the browser throws at us, and transcode into UTF-8 on the server.

    Additionaly, we should set the accept-charset to "UTF-8".


    Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).