in reply to Sorting Vietnamese text
Here's the Vietnamese alphabet sort order. And here's how to read that chart:
Here's how I handled Japanese sorting (hiragana only) based on a similar chart for Japanese:
sub transliterate {
my $str = shift;
$str =~
tr(がぎぐげござじずぜぞだぢづでどばびぶべぼぱぴぷぺぽっゃゅょ)
(かきくけこさしすせそたちつてとはひふへほはひふへほつやゆよ);
return $str;
}
sub gozyuuon {
$a->{'sort'} cmp $b->{'sort'} ||
$a->{'reading'} cmp $b->{'reading'};
}
my @rows = (
{ word => '同時', reading => 'どうじ' },
{ word => '当日', reading => 'とうじつ' },
{ word => '同士', reading => 'どうし' },
{ word => '投資', reading => 'とうし' },
{ word => '当時', reading => 'とうじ' },
{ word => '同室', reading => 'どうしつ' },
);
# create a version with the dakuten (") stripped
for (@rows) {
$_->{'sort'} = transliterate($_->{reading});
}
for my $row (sort gozyuuon @rows) {
printf "%s・%s\n", $row->{reading}, $row->{word};
}
Japanese is a bit easier since the unicode codepoints are in correct order already; I only needed to handle the equivalent-sort-order characters.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Sorting Vietnamese text
by Anonymous Monk on Dec 23, 2013 at 07:31 UTC | |
by pdenisowski (Acolyte) on Dec 23, 2013 at 15:10 UTC | |
by Anonymous Monk on Dec 23, 2013 at 18:47 UTC | |
by Anonymous Monk on Dec 23, 2013 at 19:04 UTC |