This is a rather small but useful module I hacked together a little while ago. It creates a transformation and sorting routine for any alphabet you give it. If you've got an alphabet in which vowels are sorted before consonants, you can use this module to create a sorting function that takes that into account.

You have to deal with lowercase and uppercase yourself, since (as in Klingon) they needn't sort to the same location. Supports a maximum of 256-character alphabets.

I just updated it, changing the function syntax a bit, and adding a feature.

package Language::MySort; require Exporter; @ISA = qw( Exporter ); @EXPORT = qw( lang sort ); %words = (); sub lang_sort { my ($ignore, $same, $chars, $tr, $sorter) = ("", ""); if (ref $_[-1]) { my $opt = pop; $ignore = $opt->{ignore} || ""; $same = $opt->{translate} || ""; $ignore = "\$s =~ tr/\Q$ignore\E//d;"; if ($same) { my @f = map substr($_, 0, 1, ""), @$same; $same = " =~ tr/" . quotemeta(join "", @$same) . "/" . quotemeta(join "", map $f[$_] x length($same->[$_]), 0 .. $#$s +ame) . "/"; } } $chars = @_ == 1 ? shift : join "", @_; $tr = eval qq{ sub { (my \$s = shift) $same; $ignore \$s =~ tr/\Q$chars\E/\000-\377/; \$s; } }; $sorter = sub { my @used = map $tr->($_), @_; @{ $words{$chars} }{ @used } = @_; @{ $words{$chars} }{ sort @used }; }; return wantarray() ? ($sorter, $tr) : $sorter; } 1;
Here's a sample run to create a sorter for (lowercase) French text (I don't think I left out any accented characters, but I could be wrong).
use Language::MySort; *french_sort = lang_sort( # *the character list* # only includes the characters remaining after # the identical-character map has been applied 'a' .. 'z', { # *the identical-character map* # maps characters to the character # they should sort identically as # "AXYZ" means that X, Y, and Z are translated as A identical => ["a\340", "c\347", "e\350\351\352\353", "o\364"], } ); { local $, = " "; print french_sort( "\351tude", "\352tre", "tr\350s", "entrer", "\351t\351", ); }
And here's a sample run for a small language of 10 characters in which vowels "a", "e", and "i" sort before every other letter, and ignores the language's mid-word punctuation, "-" and ".":
use Language::MySort; *weird_sort = lang_sort( # place vowels ahead of consonants qw( a e i b c d f g h j ), { # map uppercase characters to lowercase identical => [qw( aA bB cC dD eE fF gG hH iI jJ )], # ignore - and . ignore => "-.", } );
Because of the way the generator function works (using the tr/// operator), you can also write the above function call as:
use Language::MySort; *weird_sort = lang_sort( # place vowels ahead of consonants qw( a e i ), 'a' .. 'j', { # map uppercase characters to lowercase identical => [qw( aA bB cC dD eE fF gG hH iI jJ )], # ignore - and . ignore => "-.", } );
Even though the vowels are duplicated in the character list, the transliteration operator will only recognize the first occurrence of them. It's a bit of Perl magic that the module takes advantage of to make your life a bit easier.

Finally, here's a simpler sorter for English alphabetical order that puts capital letters before their lowercase counterparts, but intersperses uppercase and lowercase words (so you get Axxx axxx Bxxx bxxx, not Axxx Bxxx axxx bxxx).

use Language::MySort; *sorter = lang_sort( # nifty way to make (A, a, B, b, C, c, ... Z, z) (map +($_, lc), 'A' .. 'Z') { ignore => q{-} } );

In reply to Language::MySort by japhy

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.