Re^5: UTF-8 and Unicode the hard way

For your given example, this utility might prove illuminating. You can see that the character you describe has the hex code point 0103 and is constructed of the hex bytes c483. This is what each of the constructions in your 2 data sources are referring to. You will have to treat the two sources differently if you want to handle them both successfully.

🦛

Comment on Re^5: UTF-8 and Unicode the hard way

Replies are listed 'Best First'.
Re^6: UTF-8 and Unicode the hard way by Anonymous Monk on May 10, 2022 at 17:42 UTC
Fortunately, I don't have to handle the second data source at all; it's already in the format I need. But thank you, that's a useful tool.	[reply]