in reply to Re^5: RFC: Is this the correct use of Unicode::Collate?
in thread RFC: Is this the correct use of Unicode::Collate?

tchrist,

Thank you for your explanation/demonstration of how the UCA sort works.

I have already answered your previous post, and have apologized for mis-quoting the article.

It is not whether something prints or not that matters to the database engine, but rather it is the 'lt, eq, gt' that counts. Each key must be ordered so that every key before it must be less than, and every key after it must be greater than. So looking at your example, it seems that only chr(0) to chr(31) would be a problem.

I have written 3 database engines in my life; in the 70's in assembler, in the 80's in C, and recently in Perl. Unfortunately, staring at a lot of hex dumps is required ( even in Perl ). The one thing all of these had in common, it that all data passed from the user must be inserting into the database. So when a database is created the start key is "" value ( length of 0). This is because the user could put in:

$key="\0"; $data = "\0";
which are valid characters. Now, that could be fixed by documenting this behavior. But the chr(0) to chr(31) is used for many internal things for the DB engine and changing the order in sort would be a show stopper.

Thank you

"Well done is better than well said." - Benjamin Franklin