in reply to Language::MySort

I've got some feature requests, which I think would require changing how you do it in some cases... but I've got some ideas there.

First off, and I don't have an implementation idea here: Could you make identical sequences longer then one char? For example, in German, ue should sort the same place as ü often, and likewise ae=>ä, oe=>ö (ë, ï, and ÿ aren't in German). (Also, s-set/sharp-s should sort the same as ss, but I'm too lazy to type that properly.) (This isn't quite the traditional sort-order, BTW.)

Also, the possiblity to have alphebets longer the 255 chars would be nice. You could do this by having the RHS of your tr be based on the count of chars in the alphabet rather then a static 0-255.


Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

Replies are listed 'Best First'.
Re: Re: Language::MySort
by Juerd (Abbot) on May 28, 2003 at 07:03 UTC

    ue should sort the same place as ü often, and likewise ae=>ä, oe=>ö

    Same for esperanto.

    The Esperanto alphabet is
    a b c ĉ d e f g ĝ h ĥ i j ĵ k l m n o p r s ŝ t u ŭ v z.

    But in ASCII this is written as
    a b c cx d e f g gx h hx i j jx k l m n o p r s sx t u ux v z.
    This is safe, because there is no real "x".

    It'd be nice if we could sort ASCII German or Esperanto without using Lingua::DE::Ascii or Lingua::EO::Supersignoj.

    Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }