Juerd wrote:
> Are you sure your data is properly *decoded* when
> you read it from file/socket/database?
Thanks for answering, Juerd. The script reads it from a RSS file and I have just double-checked: If the bytes are separately encoded, like ä XML::RSS decodes it correctly. How I know it's correct? Well, I am able to read the character displayed within the HTML output (in the HTML source it's unencoded, but since Firefox thinks the encoding is UTF-8, actually set via HTML header, it displays the character as expected.
Perhaps interesting regarding my just executed test, if I replace one or all instances of separate bytes entities with the (supposedly correct) single code version I get this error in Apache's log: Wide character in print. And what I see in the browser are little squares that contain tiny hex numbers, e.g. C3 and A4.
If all entities are separate-bytes encoded, there is no error.
--
moritz wrote:
> ...This encodes two characters, not one,
> so it's certainly not what you want.
Thanks for your answer! Great, this confirms my finding.
I will try out what you suggested (Encode::decode + utf8::upgrade).
Jot
|