(1) that's still not the correct sort order
(á should come before à)
I've no idea, but
this page
indicates the opposite. You may have to
create a custom sort to override the default if desired.
(2) I actually get a
different "sorted" list when I run the same exact code. This is the
problem that I have - it seems the sort algorithms ignore the tone
marks.
Perhaps it has to do with normalization. I still get the same sort
order when using it.
#!/usr/bin/env perl
use v5.14;
use warnings;
use utf8::all;
use Unicode::Collate::Locale;
use Unicode::Normalize;
my $Collator = Unicode::Collate::Locale->new(locale =>'vi');
my @unsorted = ('á', 'ả', 'ã', 'à', 'ậ', 'ă', 'ạ', 'ẫ', 'a', 'ẩ' );
@unsorted = map { NFD($_) } @unsorted;
my @sorted = $Collator->sort(@unsorted);
say NFC("unsorted\n@unsorted");
say NFC("sorted\n@sorted");
| [reply] |
unsorted
ỷ : (1) to be fat (said of a pig); (2) to depend on
ỳ : inertia, state of inactivity, stay out, inert, sluggish
ỳ ạch : to toil, labor with difficulty
ỷ eo : reproach someone with something
ỷ lại : to depend, rely on others
ỷ thế : count on one’s power, one’s position, one’s influence
yêu nhau : to love each other, be in love
yêu quí : precious, valuable
sorted
ỷ : (1) to be fat (said of a pig); (2) to depend on
ỳ ạch : to toil, labor with difficulty
ỷ eo : reproach someone with something
yêu nhau : to love each other, be in love
yêu quí : precious, valuable
ỳ : inertia, state of inactivity, stay out, inert, sluggish
ỷ lại : to depend, rely on others
ỷ thế : count on one’s power, one’s position, one’s influence | [reply] |
sorted
ỷ : (1) to be fat (said of a pig); (2) to depend on
ỳ ạch : to toil, labor with difficulty
ỷ eo : reproach someone with something
yêu nhau : to love each other, be in love
yêu quí : precious, valuable
ỳ : inertia, state of inactivity, stay out, inert, sluggish
ỷ lại : to depend, rely on others
ỷ thế : count on one’s power, one’s position, one’s influence
Okay, I also get that output when using the entire lines as
written.
However, cutting those lines short at or before the colon ':' gives this.
sorted
ỳ :
ỷ :
ỳ ạch :
ỷ eo :
yêu nhau :
yêu quí :
ỷ lại :
ỷ thế :
What seems to be going on is that due to the complicated rules for
ordering in Vietnamese based on syllables, having the English
translation after the Vietnamese is messing up the sorting.
I'd suggest trying to separate them into a hash if possible (split on
the colon, maybe) so the sort can be based only on the Vietnamese.
| [reply] |
Are you absolutely certain your text is Unicode (UTF-8)? It's not TCVN (CP1258, ISO-2022-VN or EUC-VN), is it?
I'm sorry if this is an "Is the power cord plugged in?" kind of question, but it just doesn't make sense that you're getting different output than farang got.
Jim
| [reply] |
Thanks! I'll give it a try.
When I learned Vietnamese, the order of the tones in every dictionary (all my older ones) was
a á à ả ã ạ
Some of my newer dictionaries use the order you mention in above, but after twenty years of doing it one way, it's a little hard to change :)
There are also some differences in how initial consonant clusters are handled : does "thu" come before "tu" (in my older dictionaries "th" and "tr" are considered single "letters", kind of like c and ch in Spanish. I figured I would let this slide for now ...
Thanks again!
| [reply] |