in reply to display of utf8
For example, some of our converted output in utf8 contains a bunch of 0xC2 and 0xC3 (194 & 195) chars
perl -e 'binmode STDOUT,":utf8"; print chr(0xC2),"\n";'...Gives a LATIN CAPITAL A WITH CIRCUMFLEX (according to gnome-character-map), which is not in the input.
UTF-8 is a multi byte encoding, and when you see a 0xc2 byte in the output that's the start of two byte sequence that encodes a character from the range U+0080-U+07FF. It does not mean that the codepoint associated with U+00c2 should be displayed - that would only happen if your terminal were Latin-1 (or compatible).
|
|---|