Re: Re: Re: Unicode in <code> sections. (160= )

I guess you solved the problem, but since I authored the original node you referenced, I thought I'd just mention that I was copying and pasting from a utf8 encoded document using Internet Explorer on Mac OS 10.2.6. If in fact it stayed encoded in utf8, it is strange that it wouldn't work in utf16.

Comment on Re: Re: Re: Unicode in <code> sections. (160= )

Replies are listed 'Best First'.
Re: Re: Re: Re: Unicode in <code> sections. (160= ) by BrowserUk (Patriarch) on May 09, 2003 at 22:32 UTC
As tye identified above, whatever character was encoded as the leading whitespace, by the time it had gone through cut&paste in your browser, transmission to PM, receipt by a perl script, storage in the PM DB, retrievial via a perl script, and being transmitted to my browser, it ended up encoded as ascii 160. Exactly where in the chain the transformation occurred I wouldn't even hazard a guess at. Suffice it to say, as character code 160 is illegal as either utf-8 or utf-16, with the browser set to ignore the encoding information in the page and treat everything as utf-16, it correctly displayed the 'unknown character' symbol in its place, which is what I was seeing. The fact that some parts of the chain aren't yet set up to handle unicode means that falling back to 8-bit extended-ascii (ANSI?) representation will persist for sometime. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller	[reply]

Replies are listed 'Best First'.

Re: Re: Re: Re: Unicode in <code> sections. (160= )
by BrowserUk (Patriarch) on May 09, 2003 at 22:32 UTC

As tye identified above, whatever character was encoded as the leading whitespace, by the time it had gone through cut&paste in your browser, transmission to PM, receipt by a perl script, storage in the PM DB, retrievial via a perl script, and being transmitted to my browser, it ended up encoded as ascii 160. Exactly where in the chain the transformation occurred I wouldn't even hazard a guess at.

Suffice it to say, as character code 160 is illegal as either utf-8 or utf-16, with the browser set to ignore the encoding information in the page and treat everything as utf-16, it correctly displayed the 'unknown character' symbol in its place, which is what I was seeing.

The fact that some parts of the chain aren't yet set up to handle unicode means that falling back to 8-bit extended-ascii (ANSI?) representation will persist for sometime.

Examine what is said, not who speaks.

"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller

[reply]