in reply to Re: Re: Re: Unicode in <code> sections. (160=&nbsp;)
in thread Unicode in <code> sections.

As tye identified above, whatever character was encoded as the leading whitespace, by the time it had gone through cut&paste in your browser, transmission to PM, receipt by a perl script, storage in the PM DB, retrievial via a perl script, and being transmitted to my browser, it ended up encoded as ascii 160. Exactly where in the chain the transformation occurred I wouldn't even hazard a guess at.

Suffice it to say, as character code 160 is illegal as either utf-8 or utf-16, with the browser set to ignore the encoding information in the page and treat everything as utf-16, it correctly displayed the 'unknown character' symbol in its place, which is what I was seeing.

The fact that some parts of the chain aren't yet set up to handle unicode means that falling back to 8-bit extended-ascii (ANSI?) representation will persist for sometime.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
  • Comment on Re: Re: Re: Re: Unicode in <code> sections. (160=&nbsp;)