in reply to Re: Custom sort with string of numbers
in thread Custom sort with string of numbers

The "Schwartzian Transform" is far less important that it used to be due to recent sorting performance improvements.
This sounds interesting. What improvements, and to what part of sorting?

Does that mean that splitting or concatenating in sort is optimized away after the first time? And is this specific to the sort function, or does it apply elsewhere (eg, a sorting fuction you pass to a GUI toolkit's list widget)?

Replies are listed 'Best First'.
Re^3: Custom sort with string of numbers
by Marshall (Canon) on Aug 29, 2010 at 07:28 UTC
    I am running only Perl 5.10 now and can't easily benchmark vs older versions, but in the transition from earlier versions, I did.

    I figure that the basic sort algorithm just got a lot smarter and reduced the number of comparisons. Some other Monks here can probably explain why sorting got faster to a mind-numbing detail. But I think some problems like the worst case qsort of array is already in order have gone away. I'm just saying that sorting performance in general did increase, and noticeably so(even ST sorts).

    These various techniques like "Schwartzian Transform" are still applicable and they still improve performance when doing very large sorts.

    How much any performance difference matters depends upon the application -- how much stuff you have to sort and of course how often the sort runs!

    Update: The big Perl sorting improvement was 5.8, not 5.6. As it turned out, I skipped Active State's 5.8 release and wound up jumping from 5.6 to 5.10 when enough modules were available for 5.10 for it to make sense for me. 5.8 was released 7/18/2002, and 5.10 12/14/2008. So I think my upgrade to 5.10 happened approx late 2009. Sounds like almost all of the improvements that I saw performance wise were due to changes in 5.8.

      I figure that the basic sort algorithm just got a lot smarter and reduced the number of comparisons.
      AFAIK, the latest significant change in sorting happened in 5.6.0, when Perl changed to using mergesort by default, instead of quicksort (and introduced a pragma to switch). Also, when quicksort is being used, and the array is larger than a certain treshold, the array is shuffled first. Before, it was rather easy to generate an input that took quadratic time to sort (for instance, a list of the form 1 .. $N, reverse(1 .. $N) took quadratic time).

      The mergesort that has been implemented is pretty nifty, in the sense that it takes advantage of (reverse) sorted sublists. This makes it fast on almost sorted lists (the Achilles heel of traditional quicksort). It's also stable.

      However, all this happened a decade ago.

      These various techniques like "Schwartzian Transform" are still applicable and they still improve performance when doing very large sorts.
      Yes, and in almost all cases, a GRT is faster than an ST. However, considering the number of chapters/subchapters/subsubchapters in a book, I doubt the difference is really going to matter.

      Since the sort is stable, there's an alternative way of sorting the data:

      @data = map {join "-", @$_} sort {$a->[0] <=> $b->[0]} sort {$a->[1] <=> $b->[1]} sort {$a->[2] <=> $b->[2]} map {[split /-/]} @data;
      You won't be able to do that with Perl's quicksort implementation.

      Here's another way to sort the given data. It uses neither mergesort, nor quicksort:

      my @d = map {[split /-/]} @data; foreach my $i (reverse (0 .. 2)) { my @t; push @{$t[$$_[$i]]}, $_ for @d; @d = map {@{$_ || []}} @t; } @data = map {join "-", @$_} @d;
Re^3: Custom sort with string of numbers
by Marshall (Canon) on Aug 30, 2010 at 21:09 UTC
    Does that mean that splitting or concatenating in sort is optimized away after the first time?

    No. What happens is that since the sort algorithm is better, fewer comparisons are done (and wild worst case things are avoided) therefore the impact of making a comparison faster is less. The Schwartzian Transform makes comparisons faster by pre-calculating whatever is required to get to the values (maybe splitting a line or whatever).

    At the end of the day, efficiency does matter and at some design point, it makes sense to spend more time coding and/or give up a bit in terms of clarity for that improved performance. However that should be a conscious decision - not a one size fits all deal.