For example, some of our converted output in utf8 contains a bunch of 0xC2 and 0xC3 (194 & 195) chars
perl -e 'binmode STDOUT,":utf8"; print chr(0xC2),"\n";'...Gives a LATIN CAPITAL A WITH CIRCUMFLEX (according to gnome-character-map), which is not in the input.
UTF-8 is a multi byte encoding, and when you see a 0xc2 byte in the output that's the start of two byte sequence that encodes a character from the range U+0080-U+07FF. It does not mean that the codepoint associated with U+00c2 should be displayed - that would only happen if your terminal were Latin-1 (or compatible).
In reply to Re: display of utf8
by moritz
in thread display of utf8
by omacneil
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |