in reply to Re: Encoding is a pain.
in thread Encoding is a pain.
For shame, hardburn! Doubting the human resolve like that. :-)
Seriously ... I think that we as programmers have been ill-served by the "goal" of backwards-compatability, especially as it pertains to Unicode. There is a good solution to encoding all human written communication:
Ignore the ideas of:
Every character is listed, even if it's a billion characters. If you create a new character, add it at the end and update the appropriate language-specific subsets and collation sets.
If, like in some Asian languages, you can take two characters and combine them, have a combination character. We do the same thing in English with the correct spelling of "aether". If you need to, have a combine-2, combine-3, etc.
Then, you can have language-specific subsets (like ASCII, Latin-X, *-JIS, etc.) that refer to that master list. So, ASCII might still be the 127 characters we know and love, but they refer to 234, 12312, 5832, etc.
Sorting would be handled by the fact that your collation set DWIMs. And, each language subset can have a default collation set, just like Oracle does it.
I fail to see the problem ...
------
We are the carpenters and bricklayers of the Information Age.
Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose
I shouldn't have to say this, but any code, unless otherwise stated, is untested
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Encoding is a pain.
by hardburn (Abbot) on Sep 20, 2004 at 15:18 UTC | |
by dragonchild (Archbishop) on Sep 20, 2004 at 15:34 UTC | |
|
Re^3: Encoding is a pain.
by Elian (Parson) on Sep 20, 2004 at 19:09 UTC | |
by dragonchild (Archbishop) on Sep 20, 2004 at 19:48 UTC | |
by Aristotle (Chancellor) on Sep 20, 2004 at 22:26 UTC | |
by dragonchild (Archbishop) on Sep 21, 2004 at 00:34 UTC | |
by Ytrew (Pilgrim) on Sep 21, 2004 at 15:43 UTC |