The pages on this site are marked as being the Latin-1 character set. Increasingly, though, we are seeing UTF-8 being pasted into code listings.

The <code> blocks are immune from & expansion by design, so you can't just code HTML entities for funny chars.

So... why can't this site do it for us? We could have a <code utf-8> block and a <code Windows> block, etc. The display formatting logic would always turn chars beyond basic ASCII into named entities or Unicode entities, so it displays properly regardless of the browser's setting (or, convert to match what the page's carset is stated to be for characters in that character set).

A variation would be to have some other attribute mark in the opening <code> tag to indicate that some escape character is used in the code block, so we could write such things if we wanted to.

I think a smart default would work, too. If a code block contains characters that are beyond 127 and are legal UTF-8 encodings, it could assume (by default) that it is in fact UTF-8 and convert them to entities. If that's not correct, it would show in the preview window. Getting it wrong is no worse than the current situation with forgetting to escape out square brackets.

I think changing the sent HTML to UTF-8 is not a solution, since we would continue to have both 8-bit characters and UTF-8 pasted into input fields. The solution is to allow either for input.


In reply to Text Encoding on this site's HTML by John M. Dlugosz

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.