in reply to Re^2: Database vs XML output representation of two-byte UTF-8 character
in thread Database vs XML output representation of two-byte UTF-8 character

That was my post, which I lost control of, so let me clarify what I was suggesting.   I read the OP to mean that the output of a diag() call within a test suite) was producing results that, say, when printed to the console terminal, seemed to be munged up.   So, what I was suggesting was that you should divert that output instead to a disk-file, then use a hex-editor to examine byte-by-byte exactly what is in-between value=" and the subsequent ".

We see in a later post that, indeed, the garble was being caused by the character-encoding of the terminal window.   I anticipated that this could be the case, because there are just so-o-ooo many places where encoding/decoding can happen in both directions along that particular food-chain.

It is also possible, e.g. in MySQL, to dump the contents of a field in hexadecimal form, and once again this is the strategy that I recommend.   Get some view that will show you what the bytes are, making zero attempt to decode them as anything.   Only then can you really know.