I am working with WHOIS servers and encountering what I believe to be a character-encoding issue; specifically, one particular WHOIS server returns properly-encoded UTF8 text (I think), and another does not; that is, the first returns the ™ character as three high-bit characters (the sequence e2 84 a2), and the second returns accented characters like ĉ and á as single characters (e7 and e1).

This inconsistency means that when I display this text in a browser window (charset=utf-8), the ™ character from appears correctly , but the accented characters from appear as the dreaded black diamond with a question mark �.

What is the best way to 1) detect high-bit characters that are not part of a properly-encoded UTF sequence, and 2) "upgrade" those characters to a properly-encoded UTF sequence?

