in reply to Unicode in <code> sections.

The code in that node appears to contain non-blocking space characters (160), probably so that HTML won't ignore the extra white space (similar to a trick I used myself in rendering <code> sections at PM, however I use &nbsp; instead of a single-byte character with a value of 160).

I'm a little surprised that either your browser is that broken or that your font on NT is missing this character. If I had to guess at why you see this problem when I don't, my first guess would be that you have installed some aditional fonts and so your browser is selecting a different font than mine is. The PM CSS provides a fairly long list of fonts to choose from (trying to get reasonable appearance on most common platforms), so installing a new font can easily make the site suddenly look much different.

I don't know if you can get your browser to tell you what font it is rendering the code in, but that would be helpful information (you could verify that the font is missing a rendering for character 160).

My next guess would be that my browser (IE6) is changing 160 into 32 after it takes into account the wrapping implications. But I doubt that and also see evidence that at least some of my fonts do have a valid entry for character 160.

After that, I'm forced to start thinking that your browser is badly broken and either is ignoring the charset=ISO-8859-1 in the Content-type: header that PM provides or trying to be tricky in how it displays character 160. But that all seems pretty unlikely as well. Perhaps you have told your browser to "override" the character set?

BTW, I wouldn't use "unicode" in regard to this situation. The character set PM emits is Latin-1 and we aren't using any Unicode encoding, simply 8-bit characters.

                - tye

Replies are listed 'Best First'.
Re: Re: Unicode in <code> sections. (160=&nbsp;)
by BrowserUk (Patriarch) on May 09, 2003 at 00:44 UTC

    I was seeing variously a hollow or solid square blob symbol wherever the 160 code was, depending on which of "Arndale Mono", "Bitstream Vera Sans Mono", "Code 2000", "Console437", "Courier", "Courier New", "Fixedsys", "HVRaster", "Lucida Console" and "System" fonts I specified to be used for "preformatted text" in the font configuration for Opera.

    Having tried accepting the Author stylesheet (node_id=234493 & node_id=204962) and overriding with local settings, I eventually discovered an option that is only available via the view menu (not in the extensive configuration dialog that I spent ages trying every combination of even vaguely related options), View->Encoding->Automatic. At some point, I know not when, I apparently switch this setting off in favour of View->Encoding->Unicode->utf-16. Setting this back to automatic fixed the problem immediately. It also threw away the contents of the reply dialog that I had spent ages typing all the different things I was trying as I went. Which is a good thing because at the end of the day, everything preceding this is just a cover for

    CLOSED: USER ERROR

    Sorry to have wasted your valuable time.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
      I guess you solved the problem, but since I authored the original node you referenced, I thought I'd just mention that I was copying and pasting from a utf8 encoded document using Internet Explorer on Mac OS 10.2.6. If in fact it stayed encoded in utf8, it is strange that it wouldn't work in utf16.

        As tye identified above, whatever character was encoded as the leading whitespace, by the time it had gone through cut&paste in your browser, transmission to PM, receipt by a perl script, storage in the PM DB, retrievial via a perl script, and being transmitted to my browser, it ended up encoded as ascii 160. Exactly where in the chain the transformation occurred I wouldn't even hazard a guess at.

        Suffice it to say, as character code 160 is illegal as either utf-8 or utf-16, with the browser set to ignore the encoding information in the page and treat everything as utf-16, it correctly displayed the 'unknown character' symbol in its place, which is what I was seeing.

        The fact that some parts of the chain aren't yet set up to handle unicode means that falling back to 8-bit extended-ascii (ANSI?) representation will persist for sometime.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller