in reply to Re: ISO-Latin-1 as node and UTF-8 in frontpage (not for me)
in thread ISO-Latin-1 as node and UTF-8 in frontpage

You're right that the cause is not a perl problem. Indeed, when I get these pages using LWP::Simple and next examine it with a text editor, it looks fine (sortof). Yet the site, or rather the authors of these nodes, don't go compelteley free.

I wouldn't actually call it a bug in the browser, because the site claims to be emitting ISO-8859-1 text. Well, as mentioned in the root node: these characters are not in this character set. They are in the Windows character set, which is ISO-8859-1 plus some extra printable characters, where ISO-8859-1 has control characters — mirrors of the same characters with the highest bit cleared. I think it's typical for Microsoft to consider their own extensions as ISO-8859-1... :-) but: I expect problems on any other platform or browser. The symptoms will likely not be the same, but the characters will not show up as intended. They need not.

So I tested it. Every browser I tested it with has problems. These are:

You point out this is likely a bug in Mozilla — the fact that the pages show up differently for the same text on the different nodes is the only thing that I would qualify as a bug — it's quite striking that virtually all these browsers display these characters in almost identical ways: as two characters each.

Now, solutions? Like I said, the cause of the problem is people entering characters from these Windows extended set, but the site — which isn't really to blame, except maybe for accepting them — might remedy that. The simple approach is to replace these curly quotes with the plain Ascii quotes. A bit more advanced would be to use HTML entities. So this site could help careless authors a little by replacing these, and only these, characters (ord range = 128 .. 159).

Update: I've been told the same thing happens on the Safari browser on MacOSX.