Lithuanian dictionary must be wrong as "i" and "y" are not the same. Have they documented the rule, is it self consistent and do the dictionary entries match? If the answer to any is no then randomise the listing for .arts sorts.

What is critical for collation is that any character position is monotonic.

LC_ALL=C (or at least LC_COLLATE=C) is the only legal value. Any other value is known to break strcoll(). Better to use safer strcmp().

It should always compare by character numerical value. I.e. either byte value (US-ASCII, ISO-8859) or possibly UTF code point. The byte at a time is simpler and won't break existing applications.

With EBCDIC 1047 it will never be alphabetical order, but it will be in order and able to be bsearched. UTF-8 byte at a time will also produce odd, but consistent results.

Please get rid of i18n and l10n from at least the curses screen and command line. Other charsets are okay so long as they don't break sort, look, etc. As for GUIs with internal UTF-16 host endian buffers I don't care so long as they read and write UTF-8 to the system.


In reply to Re^2: Sorting according to locale collation by Anonymous Monk
in thread Sorting according to locale collation by amir_e_a

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.