Here's the Vietnamese alphabet sort order. And here's how to read that chart:
Here's how I handled Japanese sorting (hiragana only) based on a similar chart for Japanese:
sub transliterate {
my $str = shift;
$str =~
tr(がぎぐげござじずぜぞだぢづでどばびぶべぼぱぴぷぺぽっゃゅょ)
(かきくけこさしすせそたちつてとはひふへほはひふへほつやゆよ);
return $str;
}
sub gozyuuon {
$a->{'sort'} cmp $b->{'sort'} ||
$a->{'reading'} cmp $b->{'reading'};
}
my @rows = (
{ word => '同時', reading => 'どうじ' },
{ word => '当日', reading => 'とうじつ' },
{ word => '同士', reading => 'どうし' },
{ word => '投資', reading => 'とうし' },
{ word => '当時', reading => 'とうじ' },
{ word => '同室', reading => 'どうしつ' },
);
# create a version with the dakuten (") stripped
for (@rows) {
$_->{'sort'} = transliterate($_->{reading});
}
for my $row (sort gozyuuon @rows) {
printf "%s・%s\n", $row->{reading}, $row->{word};
}
Japanese is a bit easier since the unicode codepoints are in correct order already; I only needed to handle the equivalent-sort-order characters.
In reply to Re: Sorting Vietnamese text
by Anonymous Monk
in thread Sorting Vietnamese text
by pdenisowski
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |