It is arguable that the most experienced body when it comes to dealing with the problems of different character sets, diacriticals and their influence on collation ordering is the European Union.

Here are the official EU collation sequences for its member countries:

As best I can tell, your invented ordering would be incorrect for every country excepting possibly the UK. In particular note how Denmark, Sweden and Finland order those characters with diacriticals at the end.

Over thirty years ago (or more, its not clear), realising that there is no way to resolve the disparate expectations of all the member countries, the EU took the pragmatic approach to solving this problem.

Using Accented and Other Special Characters in Searching

The EU Inventories contain data in all Community languages except Greek and many of these languages contain accented characters in their alphabet.

All words containing accented characters are displayed as such in both WinSPIRS and WebSPIRS. For the former, you may need to choose a font other than the default font if it does not support the ISO 8859-1 (Latin alphabet No. 1) character set (known elsewhere in this database compendium as ISO Latin-1) for display/printing. All words containing accented or foreign characters (as well as a to z and A to Z) are converted to their upper case equivalents and then indexed as such. The collating sequence chosen for all indices in all languages is that for ISO Latin-1 except that all terms beginning with a numeric character appear at end. This has been done to provide ease and consistency in a multi-lingual and multi-database (i.e. when two or more databases from different languages are selected for retrieval) environment.

The actual collating sequence or character order in all indices is:

-, ., A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z, À, Á, Â, Ã, Ä, Å, Æ, Ç, È, É, Ê, Ë, Ì, Í, Î, Ï, Ñ, Ò, Ó, Ô, Õ, Ö, Ø, Ù, Ú, Û, Ü, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re^2: best sort by BrowserUk
in thread best sort by ag4ve

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.