in reply to Re: Re: Unicode in <code> sections. (160=&nbsp;)
in thread Unicode in <code> sections.

I guess you solved the problem, but since I authored the original node you referenced, I thought I'd just mention that I was copying and pasting from a utf8 encoded document using Internet Explorer on Mac OS 10.2.6. If in fact it stayed encoded in utf8, it is strange that it wouldn't work in utf16.
  • Comment on Re: Re: Re: Unicode in <code> sections. (160=&nbsp;)

Replies are listed 'Best First'.
Re: Re: Re: Re: Unicode in <code> sections. (160=&nbsp;)
by BrowserUk (Patriarch) on May 09, 2003 at 22:32 UTC

    As tye identified above, whatever character was encoded as the leading whitespace, by the time it had gone through cut&paste in your browser, transmission to PM, receipt by a perl script, storage in the PM DB, retrievial via a perl script, and being transmitted to my browser, it ended up encoded as ascii 160. Exactly where in the chain the transformation occurred I wouldn't even hazard a guess at.

    Suffice it to say, as character code 160 is illegal as either utf-8 or utf-16, with the browser set to ignore the encoding information in the page and treat everything as utf-16, it correctly displayed the 'unknown character' symbol in its place, which is what I was seeing.

    The fact that some parts of the chain aren't yet set up to handle unicode means that falling back to 8-bit extended-ascii (ANSI?) representation will persist for sometime.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller