moritz,
But all the references in the article are related to data in databases. I goggled ASCII and UTF-8, and found many times "...UTF-8 uses one byte for any ASCII characters, which have the same code values in both UTF-8 and ASCII encoding...", so why are the 0 - 127 characters being redefined? I understand the complexity of the subject, but the designers of UTF-8 knew better than to mess with ASCII, and that is why UTF-8 enhances ASCII.
'Unicode::Collate' is core, so it could be used a lot in the future, as it should be. But a lot of production environments will be affected if they don't know in advance that the code points of ASCII have been redefined.
My hope was that someone would say 'ASCII => 1' will work like Perl 'sort' for ASCII characters and UTF-8, etc for anything above 127.
Thank you
"Well done is better than well said." - Benjamin Franklin
| [reply] |
| [reply] [d/l] |
tchrist,
Thank you for your explanation/demonstration of how the UCA sort works.
I have already answered your previous post, and have apologized for mis-quoting the article.
It is not whether something prints or not that matters to the database engine,
but rather it is the 'lt, eq, gt' that counts. Each key must be ordered so that every key before it must be less than, and every key after it must be greater than. So looking at your example, it seems that only chr(0) to chr(31) would be a problem. I have written 3 database engines in my life; in the 70's in assembler, in the 80's in C, and recently in Perl. Unfortunately, staring at a lot of hex dumps is required ( even in Perl ). The one thing all of these had in common, it that all data passed from the user must be inserting into the database. So when a database is created the start key is "" value ( length of 0). This is because the user could put in:
$key="\0"; $data = "\0";
which are valid characters. Now, that could be fixed by documenting this behavior. But the chr(0) to chr(31) is used for many internal things for the DB engine and changing the order in sort would be a show stopper.
Thank you
"Well done is better than well said." - Benjamin Franklin
| [reply] [d/l] |