in reply to Re^3: Sorting Vietnamese text
in thread Sorting Vietnamese text
I think that getting Unicode::Collate to work would be the best approach, but here's a hand-rolled one that seems to work the way you want it:
use utf8;
use 5.014;
use warnings;
use List::Util qw/min/;
binmode STDOUT, ':encoding(UTF-8)';
my %order;
{
my $source = join '', 'aáàảãạăaáàảãạăắ',
'ằẳẵặâấầẩẫậbcdđeéèẻẽẹêếềểễ',
'ệfghiíìỉĩịjklmnoóòỏõọôốồổ',
'ỗộơớờởỡợpqrstuúùủũụưứừửữự',
'vwxyýỳỷỹỵz';
my $cnt = 0;
$order{$_} = ++$cnt for split //, $source;
sub vcmp($$) {
my ($a, $b) = @_;
for (0..min(length($a), length($b))) {
my $cmp = ($order{substr $a, $_, 1} // 0)
<=> ($order{ substr $b, $_, 1 } // 0);
return $cmp if $cmp != 0;
}
return length($a) <=> length($b);
}
}
say for sort { vcmp($a, $b) } ('ầm', 'ãm', 'ấm chè', 'ám số');
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^5: Sorting Vietnamese text
by pdenisowski (Acolyte) on Dec 23, 2013 at 16:05 UTC |