in reply to Re: Sorting Vietnamese text
in thread Sorting Vietnamese text
sub make_sort_order {
my $str = shift;
$str =~
tr(aáàảãạăaáàảãạăắằẳẵặâấầẩẫậbcdđeéèẻẽẹêếềểễệfghiíìỉĩịjklmnoóòỏõọôốồổỗộơớờởỡợpqrstuúùủũụưứừửữựvwxyýỳỷỹỵz)
(00000011111111111112222223456777777888888abcddddddefghijjjjjjkkkkkkllllllmnopqrrrrrrsssssstuvwwwwwwx)d;
return $str;
}
my @words = ('ầm', 'ãm', 'ấm chè', 'ám số');
print $_->[1], "[n" for
sort { $a->[0] cmp $b->[0] || $a->[1] cmp $b->[1] }
map { [ make_sort_order($_), $_ ] } @words;
It's still missing a correct 'secondary sort' (for the edge case when the diacritic-stripped words are identical); it should not be difficult to add once someone figures out a suitable transliteration that sorts asciibetically.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Sorting Vietnamese text
by pdenisowski (Acolyte) on Dec 23, 2013 at 15:10 UTC | |
by Anonymous Monk on Dec 23, 2013 at 18:47 UTC | |
by Anonymous Monk on Dec 23, 2013 at 19:04 UTC |