You have to deal with lowercase and uppercase yourself, since (as in Klingon) they needn't sort to the same location. Supports a maximum of 256-character alphabets.
I just updated it, changing the function syntax a bit, and adding a feature.
Here's a sample run to create a sorter for (lowercase) French text (I don't think I left out any accented characters, but I could be wrong).package Language::MySort; require Exporter; @ISA = qw( Exporter ); @EXPORT = qw( lang sort ); %words = (); sub lang_sort { my ($ignore, $same, $chars, $tr, $sorter) = ("", ""); if (ref $_[-1]) { my $opt = pop; $ignore = $opt->{ignore} || ""; $same = $opt->{translate} || ""; $ignore = "\$s =~ tr/\Q$ignore\E//d;"; if ($same) { my @f = map substr($_, 0, 1, ""), @$same; $same = " =~ tr/" . quotemeta(join "", @$same) . "/" . quotemeta(join "", map $f[$_] x length($same->[$_]), 0 .. $#$s +ame) . "/"; } } $chars = @_ == 1 ? shift : join "", @_; $tr = eval qq{ sub { (my \$s = shift) $same; $ignore \$s =~ tr/\Q$chars\E/\000-\377/; \$s; } }; $sorter = sub { my @used = map $tr->($_), @_; @{ $words{$chars} }{ @used } = @_; @{ $words{$chars} }{ sort @used }; }; return wantarray() ? ($sorter, $tr) : $sorter; } 1;
And here's a sample run for a small language of 10 characters in which vowels "a", "e", and "i" sort before every other letter, and ignores the language's mid-word punctuation, "-" and ".":use Language::MySort; *french_sort = lang_sort( # *the character list* # only includes the characters remaining after # the identical-character map has been applied 'a' .. 'z', { # *the identical-character map* # maps characters to the character # they should sort identically as # "AXYZ" means that X, Y, and Z are translated as A identical => ["a\340", "c\347", "e\350\351\352\353", "o\364"], } ); { local $, = " "; print french_sort( "\351tude", "\352tre", "tr\350s", "entrer", "\351t\351", ); }
Because of the way the generator function works (using the tr/// operator), you can also write the above function call as:use Language::MySort; *weird_sort = lang_sort( # place vowels ahead of consonants qw( a e i b c d f g h j ), { # map uppercase characters to lowercase identical => [qw( aA bB cC dD eE fF gG hH iI jJ )], # ignore - and . ignore => "-.", } );
Even though the vowels are duplicated in the character list, the transliteration operator will only recognize the first occurrence of them. It's a bit of Perl magic that the module takes advantage of to make your life a bit easier.use Language::MySort; *weird_sort = lang_sort( # place vowels ahead of consonants qw( a e i ), 'a' .. 'j', { # map uppercase characters to lowercase identical => [qw( aA bB cC dD eE fF gG hH iI jJ )], # ignore - and . ignore => "-.", } );
Finally, here's a simpler sorter for English alphabetical order that puts capital letters before their lowercase counterparts, but intersperses uppercase and lowercase words (so you get Axxx axxx Bxxx bxxx, not Axxx Bxxx axxx bxxx).
use Language::MySort; *sorter = lang_sort( # nifty way to make (A, a, B, b, C, c, ... Z, z) (map +($_, lc), 'A' .. 'Z') { ignore => q{-} } );
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Language::MySort
by Zaxo (Archbishop) on May 27, 2003 at 12:38 UTC | |
by DrHyde (Prior) on May 27, 2003 at 13:25 UTC | |
|
Re: Language::MySort
by theorbtwo (Prior) on May 28, 2003 at 00:12 UTC | |
by Juerd (Abbot) on May 28, 2003 at 07:03 UTC | |
|
Re: Language::MySort
by Willard B. Trophy (Hermit) on Feb 13, 2004 at 18:21 UTC | |
by japhy (Canon) on Feb 14, 2004 at 03:07 UTC |