heth has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I am a teacher on a Danish Technical High School, and had a Perl Network Management Course this week. On of the students submitted a question that we solved, but I could not explain it.

My student was sorting words, but sort() got it all wrong. I discovered that he used use locale; which I know can give problems. We solved his problem, but I could not explain what happened.

Environment: ActiveState Perl 5.8.8 build 822 (MSWin32-x86-multi-thread) on Danish XP-pro. The following sample code fails on Windows but not on FreeBSD, Solaris or RedHat. Also tried ActiveState 5.10.0 on XP, which failed as well.

use strict; use warnings; my @a = qw(a b c d e f g h i j k l A B C D E F G H I J K L); my @b = qw(aaa aab aba abb Laa lab); no locale; print "no locale : "; print sort(@a),"\n"; use POSIX; print "POSIX : "; print sort(@a),"\n"; use locale;print "use locale: "; print sort(@a),"\n"; no locale; print "no locale : "; foreach (sort(@b)){printf "%-4s", $_;} print "\n"; use POSIX; print "POSIX : "; foreach (sort(@b)){printf "%-4s", $_;} print "\n"; use locale;print "use locale: "; foreach (sort(@b)){printf "%-4s", $_;}

This code produces the following output on XP:

no locale : ABCDEFGHIJKLabcdefghijkl POSIX : ABCDEFGHIJKLabcdefghijkl use locale: aAbBcCdDeEfFgGhHiIjJkKlL no locale : Laa aaa aab aba abb lab POSIX : Laa aaa aab aba abb lab use locale: aba abb lab Laa aaa aab

Does anybody know why the sort with use locale produces aba abb lab Laa aaa aab.

Best regards

Henrik Thomsen

Replies are listed 'Best First'.
Re: use locale; on ActiveState WIN32
by Anonymous Monk on Oct 11, 2008 at 18:31 UTC
    danish alphabet :)?
    Try this snippet from perllocale
    print "\n-------------------\n"; use locale; print +(sort grep /\w/, map { chr } 0..255), "\n"; print "\n-------------------\n"; no locale; print +(sort grep /\w/, map { chr } 0..255), "\n"; print "\n-------------------\n"; __END__ no locale : ABCDEFGHIJKLabcdefghijkl POSIX : ABCDEFGHIJKLabcdefghijkl use locale: aAbBcCdDeEfFgGhHiIjJkKlL no locale : Laa aaa aab aba abb lab POSIX : Laa aaa aab aba abb lab use locale: aaa aab aba abb Laa lab ------------------- _µ01¹2²3³456789aAªáÁàÀâÂäÄãÃåÅæÆbBcCçÇdDðÐeEéÉèÈêÊëËfFƒgGhHiIíÍìÌîÎïÏj +JkKlLmMnNñÑoOºóÓòÒôÔöÖõÕøØœŒpPqQrRsSšŠßtTþÞuUúÚùÙûÛüÜvVwWxXyYýÝÿŸzZžŽ ------------------- 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz -------------------
    It has something to do with LC_COLLATE :)
      hi :-) Output:
      ------------------- _µ01¹2²3³456789aAªáÁàÀâÂãÃbBcCçÇdDðÐeEéÉèÈêÊëËfFƒgGhHiIíÍìÌîÎïÏjJkKlLm +MnNñÑoOºóÓòÒôÔõÕœŒpPqQrRsSšŠßtTþÞuUúÚùÙûÛvVwWxXyYýÝÿŸüÜzZžŽæÆäÄøØöÖåÅ ------------------- 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz -------------------
Re: use locale; on ActiveState WIN32
by syphilis (Archbishop) on Oct 11, 2008 at 23:17 UTC
    Heh ... I get the same output as you for @a, but for @b with use locale I get something different:
    use locale: aaa aab aba abb Laa lab
    Based on the @a output I would expect:
    use locale: aaa aab aba abb lab Laa
    which is different to what both you and I are getting. Someone here will probably understand what's happening ... not me, but.

    Cheers,
    Rob
Re: use locale; on ActiveState WIN32
by Narveson (Chaplain) on Oct 12, 2008 at 10:01 UTC

    The sequence 'aa' is being sorted as if you meant å (U+00E5, LATIN SMALL LETTER A WITH RING ABOVE).

    Am I right in thinking this letter comes quite a bit later than 'a' in Danish dictionaries?

      Hi Narveson

      Thank you, I've solved the challange. Great that a American should solve a problem with my native alphabet.

      Thank you, and have a nice day :-)

      :-) Henrik