You're right that the cause is not a perl problem. Indeed, when I get these pages using LWP::Simple and next examine it with a text editor, it looks fine (sortof). Yet the site, or rather the authors of these nodes, don't go compelteley free.

I wouldn't actually call it a bug in the browser, because the site claims to be emitting ISO-8859-1 text. Well, as mentioned in the root node: these characters are not in this character set. They are in the Windows character set, which is ISO-8859-1 plus some extra printable characters, where ISO-8859-1 has control characters — mirrors of the same characters with the highest bit cleared. I think it's typical for Microsoft to consider their own extensions as ISO-8859-1... :-) but: I expect problems on any other platform or browser. The symptoms will likely not be the same, but the characters will not show up as intended. They need not.

So I tested it. Every browser I tested it with has problems. These are:

You point out this is likely a bug in Mozilla — the fact that the pages show up differently for the same text on the different nodes is the only thing that I would qualify as a bug — it's quite striking that virtually all these browsers display these characters in almost identical ways: as two characters each.

Now, solutions? Like I said, the cause of the problem is people entering characters from these Windows extended set, but the site — which isn't really to blame, except maybe for accepting them — might remedy that. The simple approach is to replace these curly quotes with the plain Ascii quotes. A bit more advanced would be to use HTML entities. So this site could help careless authors a little by replacing these, and only these, characters (ord range = 128 .. 159).

Update: I've been told the same thing happens on the Safari browser on MacOSX.


In reply to Re: Re: ISO-Latin-1 as node and UTF-8 in frontpage (not for me) by bart
in thread ISO-Latin-1 as node and UTF-8 in frontpage by bart

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.