in reply to Re^5: Character Conversion Conundrum
in thread Character Conversion Conundrum

Ok, so the parser implentation is supposed to deal with possible uncicode characters/codepoints showing up in the resulting text (and should probably document how they deal with it). That makes sense, I guess.

If you're not getting slightly dizzy by now, congrats. :-)

I'm not dizzy, but I've been dealing with strangely encoded "xml" documents for some time now, so I've thought hard about it already, and I got plenty dizzy then. :-)

Any "official" documentation on this XML parser behaviour would still be appreciated, though - I could use it to slap some unnamed third parties with :-)

Replies are listed 'Best First'.
Re^7: Character Conversion Conundrum
by Aristotle (Chancellor) on Dec 22, 2004 at 23:57 UTC

    Hmm. I wish I could point you somewhere concrete. This is stuff I gleaned from the xml-dev and xsl-list mailing lists, posted by people such as Tim Bray and Michael Kay. I assume the folks who wrote the specs know what they say. :-) Unfortunately it means I don't have any reference point on hand. Maybe I should look it up on opportunity, or at least find relevant archive links from the lists.

    Makeshifts last the longest.