http://qs1969.pair.com?node_id=1068106


in reply to Sorting Vietnamese text

Update: Sorry, some errors in the code below. In particular, the constructor for the collator should be this.

my $Collator = Unicode::Collate::Locale->new(locale =>'vi');
Then the sort method will work as intended. Try it with actual Vietnamese words.

Unicode::Collate::Locale ought to help. Example code below not using code tags due to display bug with utf8 text.

#!/usr/bin/env perl
use v5.14;
use warnings;
use utf8::all;

use Unicode::Collate::Locale;
my $Collator = Unicode::Collate::Locale->new('vi');

my @unsorted = qw(
                  a..7
                  ả..3
                  à..9
                  ạ..5
                  ã..4
                  á..1
                  ă..6
                  à..2
                  á..8
                 );

my @sorted = $Collator->sort(@unsorted);

say "unsorted\n@unsorted";
say "sorted\n@sorted";
Output is as follows.
unsorted
a..7 ả..3 à..9 ạ..5 ã..4 á..1 ă..6 à..2 á..8
sorted
á..1 à..2 ả..3 ã..4 ạ..5 ă..6 a..7 á..8 à..9

Update #2: The code below actually is a correct example.

#!/usr/bin/env perl
use v5.14;
use warnings;
use utf8::all;
use Unicode::Collate::Locale;

my $Collator = Unicode::Collate::Locale->new(locale =>'vi');

my @unsorted = ('á', 'ả', 'ã', 'à', 'ậ', 'ă', 'ạ', 'ẫ', 'a', 'ẩ' );
my @sorted = $Collator->sort(@unsorted);

say "unsorted\n@unsorted";
say "sorted\n@sorted";
Giving the output:
unsorted
á ả ã à ậ ă ạ ẫ a ẩ
sorted
a à ả ã á ạ ă ẩ ẫ ậ