Although I'll second the suggestion of using the 'Accept-charset' header, I'm not so sure about user agents responding in the same encoding as the page
From RFC 2616 (HTTP/1.0):
3.4 Character Sets
HTTP uses the same definition of the term "character set" as that
described for MIME:
The term "character set" is used in this document to refer to a
method used with one or more tables to convert a sequence of octets
into a sequence of characters. Note that unconditional conversion i
+n
the other direction is not required, in that not all characters may
be available in a given character set and a character set may provi
+de
more than one sequence of octets to represent a particular characte
+r.
This definition is intended to allow various kinds of character
encoding, from simple single-table mappings such as US-ASCII to
complex table switching methods such as those that use ISO-2022's
techniques. However, the definition associated with a MIME characte
+r
set name MUST fully specify the mapping to be performed from octets
to characters. In particular, use of external profiling information
to determine the exact mapping is not permitted.
Note: This use of the term "character set" is more commonly
referred to as a "character encoding." However, since HTTP and
MIME share the same registry, it is important that the terminolo
+gy
also be shared.
HTTP character sets are identified by case-insensitive tokens. The
complete set of tokens is defined by the IANA Character Set registr
+y
[19].
charset = token
Although HTTP allows an arbitrary token to be used as a charset
value, any token that has a predefined value within the IANA
Character Set registry [19] MUST represent the character set define
+d
by that registry. Applications SHOULD limit their use of character
sets to those defined by the IANA registry.
Implementors should be aware of IETF character set requirements [38
+]
[41].
3.4.1 Missing Charset
Some HTTP/1.0 software has interpreted a Content-Type header withou
+t
charset parameter incorrectly to mean "recipient should guess."
Senders wishing to defeat this behavior MAY include a charset
parameter even when the charset is ISO-8859-1 and SHOULD do so when
it is known that it will not confuse the recipient.
Unfortunately, some older HTTP/1.0 clients did not deal properly wi
+th
an explicit charset parameter. HTTP/1.1 recipients MUST respect the
charset label provided by the sender; and those user agents that ha
+ve
a provision to "guess" a charset MUST use the charset from the
content-type field if they support that charset, rather than the
recipient's preference, when initially displaying a document. See
section 3.7.1.
I'm still not sure how to handle form data in the QUERY_STRING -- from section 2.1 of RFC 2396 (URI Syntax):
For original character sequences that contain non-ASCII characters,
however, the situation is more difficult. Internet protocols that
transmit octet sequences intended to represent character sequences
are expected to provide some way of identifying the charset used, i
+f
there might be more than one [RFC2277]. However, there is currentl
+y
no provision within the generic URI syntax to accomplish this
identification. An individual URI scheme may require a single
charset, define a default charset, or provide a way to indicate the
charset used.
It is expected that a systematic treatment of character encoding
within URI will be developed as a future modification of this
specification.
(If anyone knows of a followup RFC, I'd love to know what the number is)
And for the original poster, although Joel's article is a good start, it's intended as a quick overview -- I'd also suggest you take a look at A tutorial on character code issues