karmas has asked for the wisdom of the Perl Monks concerning the following question:

I have list with person's names in russian (KOI8-RU) encoding, how can I sort by last name, taking into account their encoding?

Replies are listed 'Best First'.
Re: Sorting russian text
by Hanamaki (Chaplain) on Nov 02, 2001 at 22:21 UTC
    Unfortunately there is no Russian sort module on CPAN. Therefore you have to roll your own sort routine or try to use locale if your system supports Russian locales.
    Using locales means that you loose a lot of portability, therefore I don't want to advise you use them.

    Since Russian encodings use only one byte (8-bit) it isn't that difficult to implement basic Russian sorting, I pressume. (Probably "e" and "ë " could be a problem). Anyway, Sean M. Burke's Article on International Sorting and the Sort::ArbBiLex modul should get you started.

    Hanamaki
Re: Sorting russian text
by Zaxo (Archbishop) on Nov 03, 2001 at 15:56 UTC

    Perl can use the libc locale functionality. See 'man perllocale' for the grisly details and warnings. If your environment is already localized to Russian, it may be enough to say:

    use locale;
    before the sort.

    After Compline,
    Zaxo

Re: Sorting russian text
by karmas (Sexton) on Nov 03, 2001 at 22:50 UTC
    I've tried:
    use locale; use POSIX qw (locale_h); .... setlocale(LC_CTYPE, 'Russian_Russia.20866') or die "Can't set locale: +$!"; foreach $name (sort keys %authors) { ($letter) = ($name =~ /^(.)/); $letter = uc $letter; if ($letter ne $last_letter) { print IND "<h2>$letter</h2>\n"; $last_letter = $letter; } print IND qq{<a href="$authors{$name}[1]\\index.html"> +$authors{$name}[0]</a><br>\n}; }
    Strange enought 'uc' works and 'sort' does some mysterious randomizing instead of it's purpose. Maybe this the locale mechanism in Win2k is broken, or the problem is that my current locale is not russian?