Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, Guys I was trying to Benchmark Schwartzian Transform versus regular sorting, and wrote a little program, maybe you can improve it. :-)
#! /usr/bin/perl -w ############################################################ # File : Perl Schwartzian Transform Demo.pl # History: 16-Mar-2016 Laszlo first version. ############################################################ # A script to demonstrate and measure the Schwartzian Transform. ############################################################ use warnings; use strict; use Time::HiRes; my $maxNumberValue = 100; my $intSampleSize = 32000; my @unsortedIntArray = map { int(rand($maxNumberValue)) } ( 1..$intSam +pleSize ); &schwartzianSortVsRegularSort(\@unsortedIntArray, \&compareAsNumber, \ +&noTransform); &schwartzianSortVsRegularSort(\@unsortedIntArray, \&compareAsString, \ +&noTransform); &schwartzianSortVsRegularSort(\@unsortedIntArray, \&compareAsNumber, \ +&sleepyNoTransform); my @chars = ("A".."Z", "a".."z"); my $maxStringLength = 300; my $stringSampleSize = 32000; my @unsortedStringArray = map { (sub { my $string; $string .= $chars[r +and @chars] for 1..(int(rand($maxStringLength)) + 1); return $string; +})->(); } ( 1..$stringSampleSize ); &schwartzianSortVsRegularSort(\@unsortedStringArray, \&compareAsString +, \&noTransform); &schwartzianSortVsRegularSort(\@unsortedStringArray, , \&compareAsNumb +er, \&returnLength ); sub schwartzianSortVsRegularSort() { my @unsortedArray_in = @{$_[0]}; my $compareFunction_in = $_[1]; my $transform4Sort_in = $_[2]; print "Collection of size: " . +@unsortedArray_in . " \n"; # Schwartzian sort, Schwartzian sort, Schwartzian sort, Schwartzia +n sort, Schwartzian sort ############################################################ my $schwartzianStart = Time::HiRes::time; my @schwartzianSorted = map { $_->[0] } sort { &$compareFunction_in($a->[1], $b->[1]) } map { [$_, &$transform4Sort_in($_)] } @unsortedArray_in; my $schwartzianDuration = Time::HiRes::time - $schwartzianStart; # Regular sort, Regular sort, Regular sort, Regular sort, Regular +sort, Regular sort ############################################################ my $regularStart = Time::HiRes::time; my @regularSorted = sort { &$compareFunction_in(&$transform4Sort_i +n($a), &$transform4Sort_in($b)) } @unsortedArray_in; my $regularDuration = Time::HiRes::time - $regularStart; print "Execution time of the Schwartzian Transform: $schwartzianDu +ration s\n"; print "Execution time of the Regular Transform: $regularDurati +on s\n"; if (!(@schwartzianSorted ~~ @regularSorted)) { die "Collections are not equal!" } # print join("; ", @schwartzianSorted); } sub noTransform() { my $param = shift; return $param; } sub sleepyNoTransform() { my $param = shift; Time::HiRes::usleep(1); return $param; } sub returnLength() { my $param = shift; return length($param); } sub sortAsString() { my $param = shift; return "$param"; } sub compareAsNumber() { my $a = shift; my $b = shift; return $a <=> $b; } sub compareAsString() { my $a = shift; my $b = shift; return $a cmp $b; }

Replies are listed 'Best First'.
Re: Benchmark Schwartzian Transform
by thunders (Priest) on Mar 16, 2015 at 22:23 UTC

    The Benchmark module is a big help with this sort of thing.

    What the Schwartzian Transform does is reduce the overall number of calls to the function that computes the value of your item. Your sort is going to do roughly (n * log n) comparisons where n is the number of items in your list. If the function that computes the value of the list item is significantly more expensive than a simple comparison you want to reduce the number of calls to that more expensive function.

    The map at the end of the Schwartzian transform only runs the function once for each element of the list. Of course you incur some overhead in building an arrayref, and looping again at the end, but for lists with many items reducing the number of computations can save a lot of cycles. You'll see very different results based on how expensive it is to compute the sort value.

Re: Benchmark Schwartzian Transform
by Anonymous Monk on Mar 16, 2015 at 22:23 UTC

    Impressive how this script was transmitted to us from one year in the future ;-)

    In the real code, will you be using subs like your compareAsString, or will you just write sort @whatever? If the latter, then here you're benchmarking code that you've artificially slowed down with the overhead of lots of sub calls.

    Also, you might be surprised at the results: the Schwartzian Transform won't give you any improvement for simple comparisons! More specifically, you'll have no performance gain as long as the overhead of the two transforms is greater than the performance gain of the actual sort (makes sense, no?). Throwing in a usleep(1), as you did, will show you that, but you're still not testing the execution time of actual code.

    So what the previous two paragraphs are getting at: What is the actual, specific code you'd like to benchmark?

    The code itself is a bit problematic: You've declared all of your subs with an empty prototype, but then they all have parameters, which only works because you're using the older &subname calling syntax, which avoids prototypes. And, you're re-using the variable names $a and $b, which are special in sort functions (not necessarily wrong, just confusing). I'd just write these short sort routines inline.

    Also, you could be using the Benchmark module to do your benchmarks for you. Here's a quick example (disclaimer: I'm not a Benchmarking expert but I hope I got it right).

    #!/usr/bin/env perl use warnings; use strict; use Benchmark 'cmpthese'; use List::Util 'shuffle'; use Data::Dumper; my @sorted; my @input = map { 'F' x (int(rand(100))+1) } 1 .. 1000; #print Dumper(\@input); cmpthese(1000, { simple => sub { @sorted = sort { length($a)<=>length($b) } shuffle @input; }, schwartz => sub { @sorted = map { $$_[0] } sort { $$a[1]<=>$$b[1] } map { [$_,length($_)] } shuffle @input; }, }); #print Dumper(\@sorted);

    The results I get are something like this:

    Rate schwartz simple schwartz 249/s -- -47% simple 472/s 90% --

    I.e. the normal sort is almost twice as fast! However, once you start making the criteria more complex (in the example replace length() with some_complex_computation()), then you'll start seeing the performance boost.

      Yes I like this answer, it points me to the benchmarking module. And gives example. Thank you.
      I do understand how ST works and I do understand that inlineing is better. Well I wanted to write a generic code, where I could try out some functions. Well I did got some better ideas from your post. :-) How would you write some generic code?
Re: Benchmark Schwartzian Transform
by Laurent_R (Canon) on Mar 16, 2015 at 22:50 UTC
    First, I would suggest that you use the Benchmark module, which make a number of things much easier.

    Second, your code seems to be uselessly complicated (and even sometimes somewhat awkward) for what you want to do. Cascading sub calls may suffice to ruin the results of a benchmark.

    Third, unless you are writing an article on Schwartzian Transform (ST) or a book, it does not make very much sense to compare regular sort with ST sort on a theoretical dataset like that. The essence of ST is caching the data used for the comparison in the sort. Caching has a cost and is worthwhile only if this cost is small compared to the saving induced by not having to recalculate many times the data used for comparisons in the sort.

    So, unless you are just trying to make a theoretical study on ST sort vs. regular sort, the exercise is not very useful. Or, rather, let's say that it will probably not be possible to extrapolate the results you're gonna get to a real life scenario, because the CPU operations that you want to save with ST will not be similar in terms of time consumed, and/or the data samples will not have the same size (large datasets will usually give a slightly better advantage to ST sort, because the number of comparisons to be made grows faster than the size of the dataset). So, in brief, if you really want to benchmark ST sort versus regular sort, you do it with a realistic data sample of your use case.

    Having said all that, I am quite regularly using ST (or GRT) when the data for comparison is not immediately available in the dataset and needs some form of computation, not necessarily for performance reasons, but because I find it cleaner and simpler than having to explicitly declare and populate temporary arrays. Also, I like the "functional programing" (Lisp-like) aspect of the ST sort. I love the ability of Perl to do this sort of things.

    Je suis Charlie.
      I like this answer too, it pointed me Guttman Rosler Transform and to the following articles:

      http://www.perlmonks.org/?node=Advanced+Sorting+-+GRT+-+Guttman+Rosler+Transform

      http://www.sysarch.com/Perl/sort_paper.html

        Check also the Orcish Maneuver.
        لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Benchmark Schwartzian Transform
by GrandFather (Saint) on Mar 16, 2015 at 21:17 UTC

    Tell us why the benchmark is important and maybe we will. In particular, have you profiled your code to demonstrate that the sort is a bottle neck? How much is a speed gain here going to affect the performance of application?

    Perl is the programming world's equivalent of English
      No, I just was experimenting (learning trough failure.) :-)

        Fair enough. One of the best ways to learn! Even better (which you did), and harder, is to share your failure with others. You learn more and others learn too, although perhaps not as much. Sounds like a great success to me :-D.

        Perl is the programming world's equivalent of English
Re: Benchmark Schwartzian Transform
by Laszlo (Novice) on Mar 19, 2015 at 13:30 UTC
    Third improvement, which returned poorer results, for Schwartzian but posting it, maybe someone will spot an error. :-)
    #! /usr/bin/perl -w ############################################################# # File : Perl Schwartzian Transform Demo.pl # History: 16-Mar-2015 Laszlo first version. # 19-Mar-2015 Laszlo, improved a little bit. # 19-Mar-2015 Laszlo, did the benchmarking which returned p +oorer results... ############################################################# # A script to demonstrate and measure the Schwartzian Transform. ############################################################# # Some form of documentation: # http://www.perlmonks.org/?node_id=1120222 use warnings; use strict; use Time::HiRes; use Benchmark; use List::Util; my $maxNumberValue = 100; my $intSampleSize = 32000; my @unsortedIntArray = map { int(rand($maxNumberValue)) } ( 1..$intSam +pleSize ); my $compareAsNumberFunction = sub { return $_[0] <=> $_[1]; }; my $compareAsStringFunction = sub { return $_[0] cmp $_[1]; }; my $noTransformFunction = sub { return shift; }; my $sleepyNoTransformFunction = sub { Time::HiRes::usleep(1); return s +hift; }; my $returnLengthFunction = sub { return length(shift); }; schwartzianSortVsRegularSort(\@unsortedIntArray, $compareAsNumberFunct +ion, $noTransformFunction); schwartzianSortVsRegularSort(\@unsortedIntArray, $compareAsStringFunct +ion, $noTransformFunction); schwartzianSortVsRegularSort(\@unsortedIntArray, $compareAsNumberFunct +ion, $sleepyNoTransformFunction); my @chars = ("A".."Z", "a".."z"); my $maxStringLength = 300; my $stringSampleSize = 32000; my @unsortedStringArray = map { (sub { my $string; $string .= $chars[rand @chars] for 1..(int(rand($maxSt +ringLength)) + 1); return $string; })->(); } ( 1..$stringSampleSize ); schwartzianSortVsRegularSort(\@unsortedStringArray, $compareAsStringFu +nction, $noTransformFunction); schwartzianSortVsRegularSort(\@unsortedStringArray, , $compareAsNumber +Function, $returnLengthFunction); sub schwartzianSortVsRegularSort { my @unsortedArray_in = @{$_[0]}; my $compareFunction_in = $_[1]; my $transform4Sort_in = $_[2]; print "Collection of size: " . +@unsortedArray_in . " \n"; Benchmark::cmpthese(10, { regular => sub { my @regularSorted = sort { $compareFunction_in->($transfor +m4Sort_in->($a), $transform4Sort_in->($b)) } List::Util::shuffle @uns +ortedArray_in; }, schwartzian => sub { my @schwartzianSorted = map { $_->[0] } sort { $compareFunction_in->($a->[1], $b->[1]) } map { [$_, $transform4Sort_in->($_)] } List::Util::shuffle @unsortedArray_in; } }); }
Re: Benchmark Schwartzian Transform
by Anonymous Monk on Mar 19, 2015 at 12:25 UTC
    Okie, I learned a little bit from critique and here are my first improvements. (Created account name too).
    #! /usr/bin/perl -w ###################################################################### +## # File : Perl Schwartzian Transform Demo.pl # History: 16-Mar-2015 Laszlo first version. # 19-Mar-2015 Laszlo, improved a little bit. ###################################################################### +## # A script to demonstrate and measure the Schwartzian Transform. ###################################################################### +## # Some form of documentation: # http://www.perlmonks.org/?node_id=1120222 use warnings; use strict; use Time::HiRes; my $maxNumberValue = 100; my $intSampleSize = 32000; my @unsortedIntArray = map { int(rand($maxNumberValue)) } ( 1..$intSam +pleSize ); my $compareAsNumberFunction = sub { return $_[0] <=> $_[1]; }; my $compareAsStringFunction = sub { return $_[0] cmp $_[1]; }; my $noTransformFunction = sub { return shift; }; my $sleepyNoTransformFunction = sub { Time::HiRes::usleep(1); return s +hift; }; my $returnLengthFunction = sub { return length(shift); }; schwartzianSortVsRegularSort(\@unsortedIntArray, $compareAsNumberFunct +ion, $noTransformFunction); schwartzianSortVsRegularSort(\@unsortedIntArray, $compareAsStringFunct +ion, $noTransformFunction); schwartzianSortVsRegularSort(\@unsortedIntArray, $compareAsNumberFunct +ion, $sleepyNoTransformFunction); my @chars = ("A".."Z", "a".."z"); my $maxStringLength = 300; my $stringSampleSize = 32000; my @unsortedStringArray = map { (sub { my $string; $string .= $chars[r +and @chars] for 1..(int(rand($maxStringLength)) + 1); return $string; +})->(); } ( 1..$stringSampleSize ); schwartzianSortVsRegularSort(\@unsortedStringArray, $compareAsStringFu +nction, $noTransformFunction); schwartzianSortVsRegularSort(\@unsortedStringArray, , $compareAsNumber +Function, $returnLengthFunction); sub schwartzianSortVsRegularSort { my @unsortedArray_in = @{$_[0]}; my $compareFunction_in = $_[1]; my $transform4Sort_in = $_[2]; print "Collection of size: " . +@unsortedArray_in . " \n"; # Schwartzian sort, Schwartzian sort, Schwartzian sort, Schwartzia +n sort, Schwartzian sort ################################################################## +######################## my $schwartzianStart = Time::HiRes::time; my @schwartzianSorted = map { $_->[0] } sort { $compareFunction_in->($a->[1], $b->[1]) } map { [$_, $transform4Sort_in->($_)] } @unsortedArray_in; my $schwartzianDuration = Time::HiRes::time - $schwartzianStart; # Regular sort, Regular sort, Regular sort, Regular sort, Regular +sort, Regular sort ################################################################## +######################## my $regularStart = Time::HiRes::time; my @regularSorted = sort { $compareFunction_in->($transform4Sort_i +n->($a), $transform4Sort_in->($b)) } @unsortedArray_in; my $regularDuration = Time::HiRes::time - $regularStart; print "Execution time of the Schwartzian Transform: $schwartzianDu +ration s\n"; print "Execution time of the Regular Transform: $regularDurati +on s\n"; if (!(@schwartzianSorted ~~ @regularSorted)) { die "Collections are not equal!" } #print join("; ", @schwartzianSorted); }
    Next improvements will be with the benchmark. :-) Now that gave a worst result...