in reply to Re: Sorting according to locale collation
in thread Sorting according to locale collation

Lithuanian dictionary must be wrong as "i" and "y" are not the same. Have they documented the rule, is it self consistent and do the dictionary entries match? If the answer to any is no then randomise the listing for .arts sorts.

What is critical for collation is that any character position is monotonic.

LC_ALL=C (or at least LC_COLLATE=C) is the only legal value. Any other value is known to break strcoll(). Better to use safer strcmp().

It should always compare by character numerical value. I.e. either byte value (US-ASCII, ISO-8859) or possibly UTF code point. The byte at a time is simpler and won't break existing applications.

With EBCDIC 1047 it will never be alphabetical order, but it will be in order and able to be bsearched. UTF-8 byte at a time will also produce odd, but consistent results.

Please get rid of i18n and l10n from at least the curses screen and command line. Other charsets are okay so long as they don't break sort, look, etc. As for GUIs with internal UTF-16 host endian buffers I don't care so long as they read and write UTF-8 to the system.

  • Comment on Re^2: Sorting according to locale collation