Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: Custom sort with string of numbers

by Marshall (Canon)
on Aug 29, 2010 at 06:16 UTC ( [id://857888]=note: print w/replies, xml ) Need Help??


in reply to Custom sort with string of numbers

No need to get real fancy here.

#!/usr/bin/perl -w use strict; my @a = ("1-1-2", "6-1-2", "3-1-4", "1-1-1"); @a = sort{ my ($A_chapter,$A_sub_chapter,$A_verse) = split(/[-]/,$a); my ($B_chapter,$B_sub_chapter,$B_verse) = split(/[-]/,$b); $A_chapter <=> $B_chapter or $A_sub_chapter <=> $B_sub_chapter or $A_verse <=> $B_verse }@a; print "@a"; __END__ prints: 1-1-1 1-1-2 3-1-4 6-1-2
The "Schwartzian Transform" is far less important that it used to be due to recent(since ver 5.8) sorting performance improvements. In any event, this appears to be a "how to do it?" question rather than a "how to do it faster?" question.

updated with better variable names in the sort. Names and code formatting do matter.

Yet another comment: The idea behind the "Schwartzian Transform" is to calculate all this stuff like in the split's in one single pass thru the data and then use the resulting values for comparison in the sort. That idea being that we've saved all this calculation on a per pair comparison basis. The idea of the "Guttman-Rosler Transform" is to make a single string value that can be compared with a string comparison (cmp). I don't think that the OP needs either one.

If you have a choice in formatting data like the OP's case, one way is to choose to add leading zeroes so that the normal cmp sort order "works out", this is the "Guttman-Rosler Transform" without any transformation! ie. @array = sort @array; works.

If the set to be sorted is relatively small, the only thing that really matters is how clear the sort is. I think the above is very clear about what it does.

Replies are listed 'Best First'.
Re^2: Custom sort with string of numbers
by Anonymous Monk on Aug 29, 2010 at 06:46 UTC
    The "Schwartzian Transform" is far less important that it used to be due to recent sorting performance improvements.
    This sounds interesting. What improvements, and to what part of sorting?

    Does that mean that splitting or concatenating in sort is optimized away after the first time? And is this specific to the sort function, or does it apply elsewhere (eg, a sorting fuction you pass to a GUI toolkit's list widget)?

      I am running only Perl 5.10 now and can't easily benchmark vs older versions, but in the transition from earlier versions, I did.

      I figure that the basic sort algorithm just got a lot smarter and reduced the number of comparisons. Some other Monks here can probably explain why sorting got faster to a mind-numbing detail. But I think some problems like the worst case qsort of array is already in order have gone away. I'm just saying that sorting performance in general did increase, and noticeably so(even ST sorts).

      These various techniques like "Schwartzian Transform" are still applicable and they still improve performance when doing very large sorts.

      How much any performance difference matters depends upon the application -- how much stuff you have to sort and of course how often the sort runs!

      Update: The big Perl sorting improvement was 5.8, not 5.6. As it turned out, I skipped Active State's 5.8 release and wound up jumping from 5.6 to 5.10 when enough modules were available for 5.10 for it to make sense for me. 5.8 was released 7/18/2002, and 5.10 12/14/2008. So I think my upgrade to 5.10 happened approx late 2009. Sounds like almost all of the improvements that I saw performance wise were due to changes in 5.8.

        I figure that the basic sort algorithm just got a lot smarter and reduced the number of comparisons.
        AFAIK, the latest significant change in sorting happened in 5.6.0, when Perl changed to using mergesort by default, instead of quicksort (and introduced a pragma to switch). Also, when quicksort is being used, and the array is larger than a certain treshold, the array is shuffled first. Before, it was rather easy to generate an input that took quadratic time to sort (for instance, a list of the form 1 .. $N, reverse(1 .. $N) took quadratic time).

        The mergesort that has been implemented is pretty nifty, in the sense that it takes advantage of (reverse) sorted sublists. This makes it fast on almost sorted lists (the Achilles heel of traditional quicksort). It's also stable.

        However, all this happened a decade ago.

        These various techniques like "Schwartzian Transform" are still applicable and they still improve performance when doing very large sorts.
        Yes, and in almost all cases, a GRT is faster than an ST. However, considering the number of chapters/subchapters/subsubchapters in a book, I doubt the difference is really going to matter.

        Since the sort is stable, there's an alternative way of sorting the data:

        @data = map {join "-", @$_} sort {$a->[0] <=> $b->[0]} sort {$a->[1] <=> $b->[1]} sort {$a->[2] <=> $b->[2]} map {[split /-/]} @data;
        You won't be able to do that with Perl's quicksort implementation.

        Here's another way to sort the given data. It uses neither mergesort, nor quicksort:

        my @d = map {[split /-/]} @data; foreach my $i (reverse (0 .. 2)) { my @t; push @{$t[$$_[$i]]}, $_ for @d; @d = map {@{$_ || []}} @t; } @data = map {join "-", @$_} @d;
      Does that mean that splitting or concatenating in sort is optimized away after the first time?

      No. What happens is that since the sort algorithm is better, fewer comparisons are done (and wild worst case things are avoided) therefore the impact of making a comparison faster is less. The Schwartzian Transform makes comparisons faster by pre-calculating whatever is required to get to the values (maybe splitting a line or whatever).

      At the end of the day, efficiency does matter and at some design point, it makes sense to spend more time coding and/or give up a bit in terms of clarity for that improved performance. However that should be a conscious decision - not a one size fits all deal.

Re^2: Custom sort with string of numbers
by linuxfan (Beadle) on Aug 29, 2010 at 15:17 UTC
    Dear Marshall and all other monks,

    Thanks a lot for helping me out!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://857888]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (7)
As of 2024-04-24 10:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found