Re: Custom sort with string of numbers

No need to get real fancy here.

#!/usr/bin/perl -w
use strict;

my @a = ("1-1-2", "6-1-2", "3-1-4", "1-1-1");

@a = sort{  my ($A_chapter,$A_sub_chapter,$A_verse) = split(/[-]/,$a);
            my ($B_chapter,$B_sub_chapter,$B_verse) = split(/[-]/,$b);
                       $A_chapter <=> $B_chapter
                                  or
                    $A_sub_chapter <=> $B_sub_chapter
                                  or
                          $A_verse <=> $B_verse
         }@a;
         
print "@a"; 

__END__
prints: 1-1-1 1-1-2 3-1-4 6-1-2
[download]

The "Schwartzian Transform" is far less important that it used to be due to ~~recent~~(since ver 5.8) sorting performance improvements. In any event, this appears to be a "how to do it?" question rather than a "how to do it faster?" question.

updated with better variable names in the sort. Names and code formatting do matter.

Yet another comment: The idea behind the "Schwartzian Transform" is to calculate all this stuff like in the split's in one single pass thru the data and then use the resulting values for comparison in the sort. That idea being that we've saved all this calculation on a per pair comparison basis. The idea of the "Guttman-Rosler Transform" is to make a single string value that can be compared with a string comparison (cmp). I don't think that the OP needs either one.

If you have a choice in formatting data like the OP's case, one way is to choose to add leading zeroes so that the normal cmp sort order "works out", this is the "Guttman-Rosler Transform" without any transformation! ie. @array = sort @array; works.

If the set to be sorted is relatively small, the only thing that really matters is how clear the sort is. I think the above is very clear about what it does.

Comment on Re: Custom sort with string of numbers Download Code

Replies are listed 'Best First'.
Re^2: Custom sort with string of numbers by Anonymous Monk on Aug 29, 2010 at 06:46 UTC
The "Schwartzian Transform" is far less important that it used to be due to recent sorting performance improvements. This sounds interesting. What improvements, and to what part of sorting? Does that mean that splitting or concatenating in sort is optimized away after the first time? And is this specific to the `sort` function, or does it apply elsewhere (eg, a sorting fuction you pass to a GUI toolkit's list widget)?	[reply] [d/l]
Re^3: Custom sort with string of numbers by Marshall (Canon) on Aug 29, 2010 at 07:28 UTC
I am running only Perl 5.10 now and can't easily benchmark vs older versions, but in the transition from earlier versions, I did. I figure that the basic sort algorithm just got a lot smarter and reduced the number of comparisons. Some other Monks here can probably explain why sorting got faster to a mind-numbing detail. But I think some problems like the worst case qsort of array is already in order have gone away. I'm just saying that sorting performance in general did increase, and noticeably so(even ST sorts). These various techniques like "Schwartzian Transform" are still applicable and they still improve performance when doing very large sorts. How much any performance difference matters depends upon the application -- how much stuff you have to sort and of course how often the sort runs! Update: The big Perl sorting improvement was 5.8, not 5.6. As it turned out, I skipped Active State's 5.8 release and wound up jumping from 5.6 to 5.10 when enough modules were available for 5.10 for it to make sense for me. 5.8 was released 7/18/2002, and 5.10 12/14/2008. So I think my upgrade to 5.10 happened approx late 2009. Sounds like almost all of the improvements that I saw performance wise were due to changes in 5.8.	[reply]
Re^4: Custom sort with string of numbers by JavaFan (Canon) on Aug 29, 2010 at 14:46 UTC
I figure that the basic sort algorithm just got a lot smarter and reduced the number of comparisons. AFAIK, the latest significant change in sorting happened in 5.6.0, when Perl changed to using mergesort by default, instead of quicksort (and introduced a pragma to switch). Also, when quicksort is being used, and the array is larger than a certain treshold, the array is shuffled first. Before, it was rather easy to generate an input that took quadratic time to sort (for instance, a list of the form `1 .. $N, reverse(1 .. $N)` took quadratic time). The mergesort that has been implemented is pretty nifty, in the sense that it takes advantage of (reverse) sorted sublists. This makes it fast on almost sorted lists (the Achilles heel of traditional quicksort). It's also stable. However, all this happened a decade ago. These various techniques like "Schwartzian Transform" are still applicable and they still improve performance when doing very large sorts. Yes, and in almost all cases, a GRT is faster than an ST. However, considering the number of chapters/subchapters/subsubchapters in a book, I doubt the difference is really going to matter. Since the sort is stable, there's an alternative way of sorting the data: `@data = map {join "-", @$_} sort {$a->[0] <=> $b->[0]} sort {$a->[1] <=> $b->[1]} sort {$a->[2] <=> $b->[2]} map {[split /-/]} @data;` [download] You won't be able to do that with Perl's quicksort implementation. Here's another way to sort the given data. It uses neither mergesort, nor quicksort: `my @d = map {[split /-/]} @data; foreach my $i (reverse (0 .. 2)) { my @t; push @{$t[$$_[$i]]}, $_ for @d; @d = map {@{$_ \|\| []}} @t; } @data = map {join "-", @$_} @d;` [download]	[reply] [d/l] [select]
Re^3: Custom sort with string of numbers by Marshall (Canon) on Aug 30, 2010 at 21:09 UTC
Does that mean that splitting or concatenating in sort is optimized away after the first time? No. What happens is that since the sort algorithm is better, fewer comparisons are done (and wild worst case things are avoided) therefore the impact of making a comparison faster is less. The Schwartzian Transform makes comparisons faster by pre-calculating whatever is required to get to the values (maybe splitting a line or whatever). At the end of the day, efficiency does matter and at some design point, it makes sense to spend more time coding and/or give up a bit in terms of clarity for that improved performance. However that should be a conscious decision - not a one size fits all deal.	[reply]
Re^2: Custom sort with string of numbers by linuxfan (Beadle) on Aug 29, 2010 at 15:17 UTC
Dear Marshall and all other monks, Thanks a lot for helping me out!	[reply]