http://qs1969.pair.com?node_id=611365


in reply to Re: Sorting according to locale collation
in thread Sorting according to locale collation

just out of curiosity: You said that "i" and "y" are treated the same. Would it still be right if you swap "ia" and "ya" in that list?

I'm not Lithuanian - i just studied it a little in the University. From what i've seen in dictionaries and grammar books, when the letter following I/Y is the same, I comes before Y.

Does the Unix utility sort(1) behave correctly?

I tried running this:

[root@sugarcube loc]# LC_COLLATE="lt_LT" [root@sugarcube loc]# export LC_COLLATE [root@sugarcube loc]# locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE=lt_LT LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= [root@sugarcube loc]# cat ia.txt ia ic ib ya yb yc [root@sugarcube loc]# sort ia.txt ia ib ic ya yb yc

Looks like sort(1) did something, but not what i expected. I am not sure that i changed the locale correctly - i am not a Unix export. Any help will be appreciated.

Replies are listed 'Best First'.
Re^3: Sorting according to locale collation
by betterworld (Curate) on Apr 22, 2007 at 14:42 UTC

    Looks like sort(1) prints the lines in the same order as Perl's sort does. So I guess the problem is that the locale itself does not treat i and y the same. (I don't know if that's possible at all.)

    According to perldoc perllocale, the locale answers the question "which of these letters comes first". I don't think that the answer "neither i nor y comes first, but i comes first if it is the only difference in the whole word" is allowed.

Re^3: Sorting according to locale collation
by Krambambuli (Curate) on Apr 22, 2007 at 15:55 UTC
    What is the output if you add say

    ha
    ja

    to your test data set ?