Re^3: Database vs XML output representation of two-byte UTF-8 character

That was my post, which I lost control of, so let me clarify what I was suggesting. I read the OP to mean that the output of a diag() call within a test suite) was producing results that, say, when printed to the console terminal, seemed to be munged up. So, what I was suggesting was that you should divert that output instead to a disk-file, then use a hex-editor to examine byte-by-byte exactly what is in-between value=" and the subsequent ".

We see in a later post that, indeed, the garble was being caused by the character-encoding of the terminal window. I anticipated that this could be the case, because there are just so-o-ooo many places where encoding/decoding can happen in both directions along that particular food-chain.

It is also possible, e.g. in MySQL, to dump the contents of a field in hexadecimal form, and once again this is the strategy that I recommend. Get some view that will show you what the bytes are, making zero attempt to decode them as anything. Only then can you really know.