linuxfan has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks,

I am trying to sort an array that has strings in the format "1-1-1", "1-1-2", "2-3-1" etc. in the ascending order such that 1-1-1 appears before 1-1-2 and so on. Think of each string representing a "chapter-subchapter-verse". I tried the following snippet but it doesn't give me the desirable result.

my @a = ("1-1-2", "6-1-2", "3-1-4"); my @b = sort { (split /-/, $a)[0] <=> (split /-/, $b)[0] && (split /-/, $a)[1] <=> (split /-/, $b)[1] && (split /-/, $a)[2] <=> (split /-/, $b)[2] } @a; print "@b\n";

Obviously something's lacking in my understanding of writing a custom sort function. Can someone help me out?

Many thanks

Replies are listed 'Best First'.
Re: Custom sort with string of numbers
by AnomalousMonk (Archbishop) on Aug 29, 2010 at 05:45 UTC

    Another approach is the 'decorate-sort-undecorate pattern' or Guttman-Rosler Transform (GRT), which uses the default lexical sorting of sort for maximum efficiency (even if the comparisons involved are numeric):

    >perl -wMstrict -le "my @unsorted = qw(1-1-2 6-1-2 3-1-4 3-11-4 3-2-4); my @sorted = map { s{ (\d+) }{ sprintf '%ld', $1 }xmsge; $_ } sort map { s{ (\d+) }{ sprintf '%010ld', $1 }xmsge; $_ } @unsorted ; print qq{@sorted}; " 1-1-2 3-1-4 3-2-4 3-11-4 6-1-2

    Update: Also: Take a look at Sort::Maker.

Re: Custom sort with string of numbers
by Anonymous Monk on Aug 29, 2010 at 05:03 UTC
    Use the logical or operator, not the logical and operator (consider what the cmp and <=> operators return!).

    Also, you could split once per string per sort. Or, better (for performance, though not necessarily for memory usage), use a Schwartzian transform so you only split once per string:

    @bar = map { $$_[0] } sort { $$a[1] <=> $$b[1] || $$a[2] <=> $$b[2] || $$a[3] <=> $$b[3] } map { [$_, split/-/] } @foo;
    This also helps keep the "what I'm splitting" and "what I'm sorting" code clear by separating them.
Re: Custom sort with string of numbers
by Marshall (Canon) on Aug 29, 2010 at 06:16 UTC
    No need to get real fancy here.

    #!/usr/bin/perl -w use strict; my @a = ("1-1-2", "6-1-2", "3-1-4", "1-1-1"); @a = sort{ my ($A_chapter,$A_sub_chapter,$A_verse) = split(/[-]/,$a); my ($B_chapter,$B_sub_chapter,$B_verse) = split(/[-]/,$b); $A_chapter <=> $B_chapter or $A_sub_chapter <=> $B_sub_chapter or $A_verse <=> $B_verse }@a; print "@a"; __END__ prints: 1-1-1 1-1-2 3-1-4 6-1-2
    The "Schwartzian Transform" is far less important that it used to be due to recent(since ver 5.8) sorting performance improvements. In any event, this appears to be a "how to do it?" question rather than a "how to do it faster?" question.

    updated with better variable names in the sort. Names and code formatting do matter.

    Yet another comment: The idea behind the "Schwartzian Transform" is to calculate all this stuff like in the split's in one single pass thru the data and then use the resulting values for comparison in the sort. That idea being that we've saved all this calculation on a per pair comparison basis. The idea of the "Guttman-Rosler Transform" is to make a single string value that can be compared with a string comparison (cmp). I don't think that the OP needs either one.

    If you have a choice in formatting data like the OP's case, one way is to choose to add leading zeroes so that the normal cmp sort order "works out", this is the "Guttman-Rosler Transform" without any transformation! ie. @array = sort @array; works.

    If the set to be sorted is relatively small, the only thing that really matters is how clear the sort is. I think the above is very clear about what it does.

      The "Schwartzian Transform" is far less important that it used to be due to recent sorting performance improvements.
      This sounds interesting. What improvements, and to what part of sorting?

      Does that mean that splitting or concatenating in sort is optimized away after the first time? And is this specific to the sort function, or does it apply elsewhere (eg, a sorting fuction you pass to a GUI toolkit's list widget)?

        I am running only Perl 5.10 now and can't easily benchmark vs older versions, but in the transition from earlier versions, I did.

        I figure that the basic sort algorithm just got a lot smarter and reduced the number of comparisons. Some other Monks here can probably explain why sorting got faster to a mind-numbing detail. But I think some problems like the worst case qsort of array is already in order have gone away. I'm just saying that sorting performance in general did increase, and noticeably so(even ST sorts).

        These various techniques like "Schwartzian Transform" are still applicable and they still improve performance when doing very large sorts.

        How much any performance difference matters depends upon the application -- how much stuff you have to sort and of course how often the sort runs!

        Update: The big Perl sorting improvement was 5.8, not 5.6. As it turned out, I skipped Active State's 5.8 release and wound up jumping from 5.6 to 5.10 when enough modules were available for 5.10 for it to make sense for me. 5.8 was released 7/18/2002, and 5.10 12/14/2008. So I think my upgrade to 5.10 happened approx late 2009. Sounds like almost all of the improvements that I saw performance wise were due to changes in 5.8.

        Does that mean that splitting or concatenating in sort is optimized away after the first time?

        No. What happens is that since the sort algorithm is better, fewer comparisons are done (and wild worst case things are avoided) therefore the impact of making a comparison faster is less. The Schwartzian Transform makes comparisons faster by pre-calculating whatever is required to get to the values (maybe splitting a line or whatever).

        At the end of the day, efficiency does matter and at some design point, it makes sense to spend more time coding and/or give up a bit in terms of clarity for that improved performance. However that should be a conscious decision - not a one size fits all deal.

      Dear Marshall and all other monks,

      Thanks a lot for helping me out!

Re: Custom sort with string of numbers
by salva (Canon) on Aug 29, 2010 at 08:51 UTC
    use Sort::Key::OID qw(oidsort); my @b = oidsort @a;