in reply to Re^3: Encoding is a pain.
in thread Encoding is a pain.
I think I wasn't as explicit as I should have been.
That's a font issue, not an encoding issue. A font would map to a given subset or collation set. Or, if you want, you could map your font to the master character set. It would be up to the application to figure out what to do with characters that don't appear in the font. There are English fonts that don't have glyphs for all the characters in English. (Symbol is a good example.)
And, this isn't a radical departure. Fonts are mapped to character sets right now, but the work is done by the display library code. I'm proposing that the font would contain the necessary metadata to describe either the subset/collationset or a subset definition that the font represents. Again, this is the backwards-compatible thing going on. Fonts, historically, have been the provenance of the application. Later, fontmaps were created, but the application still maintained control over how to interpret the fontmap. Instead, why doesn't the application ask the fontmap if it know how to represent character #234211 and, if so, would it please return the appropriate glyph?
The better answer is to have a collation set that represents 7-bit ASCII. There is no need to have collation sets be the same size as each other. A Chinese character set would probably run to at least 30k characters. An English character set could run as small as 40 or 50, if you ignore case. Some languages might be able to get away with even less.
If your system is sufficiently general with appropriate views into them, then there is nothing you cannot emulate. Heck, if one were truly masochistic, one might even develop a collation set that would emulate Unicode. :-)
------
We are the carpenters and bricklayers of the Information Age.
Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose
I shouldn't have to say this, but any code, unless otherwise stated, is untested
|
|---|