in reply to Re: Guess between UTF8 and Latin1/ISO-8859-1
in thread Guess between UTF8 and Latin1/ISO-8859-1

The byte order mark is only used for the UTF-16 encoding, the two-byte Unicode encoding. UTF-8 is the default encoding if the byte order mark or encoding parameter is not present. You are correct that if the encoding is not specified, and the file is not valid UTF-8, then it is an error.

Numeric entities are always Unicode characters. Unicode is the only character set used in XML. Different encodings can be specified, but they should be mapped into Unicode so the parser deals with the Unicode.

  • Comment on Re: Re: Guess between UTF8 and Latin1/ISO-8859-1