unsorted
ỷ : (1) to be fat (said of a pig); (2) to depend on
ỳ : inertia, state of inactivity, stay out, inert, sluggish
ỳ ạch : to toil, labor with difficulty
ỷ eo : reproach someone with something
ỷ lại : to depend, rely on others
ỷ thế : count on one’s power, one’s position, one’s influence
yêu nhau : to love each other, be in love
yêu quí : precious, valuable
sorted
ỷ : (1) to be fat (said of a pig); (2) to depend on
ỳ ạch : to toil, labor with difficulty
ỷ eo : reproach someone with something
yêu nhau : to love each other, be in love
yêu quí : precious, valuable
ỳ : inertia, state of inactivity, stay out, inert, sluggish
ỷ lại : to depend, rely on others
ỷ thế : count on one’s power, one’s position, one’s influence | [reply] |
sorted
ỷ : (1) to be fat (said of a pig); (2) to depend on
ỳ ạch : to toil, labor with difficulty
ỷ eo : reproach someone with something
yêu nhau : to love each other, be in love
yêu quí : precious, valuable
ỳ : inertia, state of inactivity, stay out, inert, sluggish
ỷ lại : to depend, rely on others
ỷ thế : count on one’s power, one’s position, one’s influence
Okay, I also get that output when using the entire lines as
written.
However, cutting those lines short at or before the colon ':' gives this.
sorted
ỳ :
ỷ :
ỳ ạch :
ỷ eo :
yêu nhau :
yêu quí :
ỷ lại :
ỷ thế :
What seems to be going on is that due to the complicated rules for
ordering in Vietnamese based on syllables, having the English
translation after the Vietnamese is messing up the sorting.
I'd suggest trying to separate them into a hash if possible (split on
the colon, maybe) so the sort can be based only on the Vietnamese.
| [reply] |
Thanks again, but what I'm looking for is that every word staring with ỳ comes before any word starting with ỷ, so even that sort order isn't quite right, Also why are all the entries with ỷ not together?. Instead of
sorted
ỳ :
ỷ :
ỳ ạch :
ỷ eo :
yêu nhau :
yêu quí :
ỷ lại :
ỷ thế :
should be
sorted
ỳ :
ỳ ạch :
ỷ :
ỷ eo :
ỷ lại :
ỷ thế :
yêu nhau :
yêu quí :
This is how all paper dictionaries do it, regardless of which order they use for the tone marks. I'm beginning to wonder if I'm the only person who's ever cared about this before :)
By the way, the reason I'm doing this is that I'm planning to release a large (>50,000 words) Vietnamese-English dictionary (as a single UTF8 file) under the CC license (essentially free to use for any purpose) and I'd like to make it available in "properly" sorted order. I've done similar projects for Chinese, Esperanto, and Interlinga already (see www.denisowski.org), but those are a lot easier to sort :)
Any other ideas? Thanks again for the help! | [reply] |
Are you absolutely certain your text is Unicode (UTF-8)? It's not TCVN (CP1258, ISO-2022-VN or EUC-VN), is it?
I'm sorry if this is an "Is the power cord plugged in?" kind of question, but it just doesn't make sense that you're getting different output than farang got.
Jim
| [reply] |
Thanks! I'll give it a try.
When I learned Vietnamese, the order of the tones in every dictionary (all my older ones) was
a á à ả ã ạ
Some of my newer dictionaries use the order you mention in above, but after twenty years of doing it one way, it's a little hard to change :)
There are also some differences in how initial consonant clusters are handled : does "thu" come before "tu" (in my older dictionaries "th" and "tr" are considered single "letters", kind of like c and ch in Spanish. I figured I would let this slide for now ...
Thanks again! | [reply] |