in reply to UTF8 Validity

It is valid UTF-16 ... perhaps that's what you're dealing with. A good resource for this is the fileformat.info character reference page