The High Price of Golf, and A Surprise

It occured to me, in a fit of dementia, that a numeric sort with subtraction instead of the <=> operator would shave two strokes in Perl Golf. I became curious about the cost of doing that.

I threw together a benchmark and got a surprise.

No, the subtraction method isn't fast: in fact, it more than doubles the time taken in sorting. The surprise is that the spaceship operator is faster than dictionary sort.

Received wisdom in the Perl community is that its string sort is optimum and there is a price to pay for numeric sort. In this benchmark, each type of sort receives the same data, but already cast to float, integer, or string. As much extra processing as possible is outside the timing loop, so that only sort performance is measured, without any representation penalties for one method over another. Spaceship is not penalized by a string-to-float conversion, for instance. The numbers are the same, but each method gets them in predigested form. No measured time is spent on my or other allocation, since the result is stored in a preallocated array, sized to the data it will receive.

The data (Edit - New, from corrected code):

$ perl sortbench.pl
Benchmark: timing 1000 iterations of f_alpha, f_owtdi, f_sship, i_alph
+a,
 i_owtdi, i_sship...
   f_alpha: 57 wallclock secs (51.10 usr +  0.02 sys = 51.12 CPU) @ 19
+.56/s (n=1000)
   f_owtdi: 92 wallclock secs (83.53 usr +  0.01 sys = 83.54 CPU) @ 11
+.97/s (n=1000)
   f_sship: 37 wallclock secs (33.55 usr +  0.00 sys = 33.55 CPU) @ 29
+.81/s (n=1000)
   i_alpha: 41 wallclock secs (36.44 usr +  0.01 sys = 36.45 CPU) @ 27
+.43/s (n=1000)
   i_owtdi: 91 wallclock secs (82.73 usr +  0.02 sys = 82.75 CPU) @ 12
+.08/s (n=1000)
   i_sship: 38 wallclock secs (32.81 usr +  0.00 sys = 32.81 CPU) @ 30
+.48/s (n=1000)
          Rate f_owtdi i_owtdi f_alpha i_alpha f_sship i_sship
f_owtdi 12.0/s      --     -1%    -39%    -56%    -60%    -61%
i_owtdi 12.1/s      1%      --    -38%    -56%    -59%    -60%
f_alpha 19.6/s     63%     62%      --    -29%    -34%    -36%
i_alpha 27.4/s    129%    127%     40%      --     -8%    -10%
f_sship 29.8/s    149%    147%     52%      9%      --     -2%
i_sship 30.5/s    155%    152%     56%     11%      2%      --
[download]

Each is a sort of 10000 random numbers from ~~-1000~~ 0 to 1000, repeated 1000 times. The array to sort is generated numerically as floats. It is then translated to integers, leading zero fixed point decimal strings, and leading zero integer strings. An array to assign for each type is preallocated.

It appears to me that spaceship's reputed slowness goes away when its data is already numeric, and the arena is levelled by insisting on equivalent data for both forms of sort.

Here is the benchmark code (Edit - Corrected to avoid negative range and to word align strings):

#!/usr/bin/perl -w
use strict;
use Benchmark 'cmpthese';

my $count = 1000;
my @nfl = map { int( rand(1000) ) } 1..10000;
my @nin = map { int } @nfl;
my @ain = map { sprintf "%03d", $_ } @nin;
my @afl = map { my $str = sprintf "%3.12d", $_;
                '0' x (3 - index $str, '.')
                . $str;
              } @nfl;

my @rin = (int 1) x 10000;
my @rfl = (1.01)  x 10000;
my @sfl = ('z' x 16) x 10000;
my @sin = ('z' x 4)  x 10000;

cmpthese(
    $count,
    {
      f_owtdi   => sub { @rfl = sort { $a - $b }   @nfl },
      i_owtdi   => sub { @rin = sort { $a - $b }   @nin },
      f_sship   => sub { @rfl = sort { $a <=> $b } @nfl },
      i_sship   => sub { @rin = sort { $a <=> $b } @nin },
      f_alpha   => sub { @sfl = sort               @afl },
      i_alpha   => sub { @sin = sort               @ain },
    });
[download]

Update: Code and data corrections, thanks to bart for the spot.

After Compline,
Zaxo

Edit by tye to change PRE to CODE around wide lines

Comment on The High Price of Golf, and A Surprise Select or Download Code

Replies are listed 'Best First'.
Re: The High Price of Golf, and A Surprise by bart (Canon) on Sep 07, 2002 at 09:09 UTC
You should include `cmp` in your benchmark. I think you'll find it slower than the built-in sort. That speed difference is the overhead of the callback sub. I expect that `cmp` in general is a far slower operation than `<=>`. The latter only takes one CPU instruction (plus the overhead of the interpreter), the former is a slow library call for many strings — especially if the strings you compare are identical, because now all characters in the strings have to be examined. And I think what you found is that the callback overhead for `sort` is still faster than the speed difference between these two ops. Enough blah blah. I've added them to your benchmark code, so it now looks like: `cmpthese( $count, { f_owtdi => sub { @rfl = sort { $a - $b } @nfl }, i_owtdi => sub { @rin = sort { $a - $b } @nin }, f_sship => sub { @rfl = sort { $a <=> $b } @nfl }, i_sship => sub { @rin = sort { $a <=> $b } @nin }, f_alpha => sub { @sfl = sort @afl }, i_alpha => sub { @sin = sort @ain }, f_cmp => sub { @sfl = sort { $a cmp $b } @afl }, i_cmp => sub { @sin = sort { $a cmp $b } @ain }, });` [download] I'll update with the results in a moment. This benchmark takes many minutes to run, and I don't want to skew the results by doing heavy stuff with my humble PC. Update: I'm back. Here are the results: Benchmark: timing 300 iterations of f_alpha, f_cmp, f_owtdi, f_sship, +i_alpha, i_cmp, i_owtdi, i_sship... f_alpha: 26 wallclock secs (25.92 usr + 0.00 sys = 25.92 CPU) @ 11 +.57/s (n=300) f_cmp: 26 wallclock secs (25.87 usr + 0.00 sys = 25.87 CPU) @ 11 +.60/s (n=300) f_owtdi: 37 wallclock secs (36.58 usr + 0.00 sys = 36.58 CPU) @ 8 +.20/s (n=300) f_sship: 17 wallclock secs (16.92 usr + 0.00 sys = 16.92 CPU) @ 17 +.73/s (n=300) i_alpha: 21 wallclock secs (20.76 usr + 0.00 sys = 20.76 CPU) @ 14 +.45/s (n=300) i_cmp: 20 wallclock secs (20.82 usr + 0.00 sys = 20.82 CPU) @ 14 +.41/s (n=300) i_owtdi: 35 wallclock secs (35.53 usr + 0.00 sys = 35.53 CPU) @ 8 +.44/s (n=300) i_sship: 16 wallclock secs (15.87 usr + 0.00 sys = 15.87 CPU) @ 18 +.90/s (n=300) Rate f_owtdi i_owtdi f_alpha f_cmp i_cmp i_alpha f_sship + i_sship f_owtdi 8.20/s -- -3% -29% -29% -43% -43% -54% + -57% i_owtdi 8.44/s 3% -- -27% -27% -41% -42% -52% + -55% f_alpha 11.6/s 41% 37% -- -0% -20% -20% -35% + -39% f_cmp 11.6/s 41% 37% 0% -- -20% -20% -35% + -39% i_cmp 14.4/s 76% 71% 24% 24% -- -0% -19% + -24% i_alpha 14.5/s 76% 71% 25% 25% 0% -- -18% + -24% f_sship 17.7/s 116% 110% 53% 53% 23% 23% -- + -6% i_sship 18.9/s 130% 124% 63% 63% 31% 31% 7% + -- [download] That's odd. there is NO speed difference between [fi]_cmp and [fi]_alpha. That means that the cost of the callback is negligable... unless Perl is optimizing the callback away? Just to make sure, I've also swapped $a and $b in my callback sub. It doesn't make a difference. Conclusion: the speed gain you get by using numerical sort, is entirely due to the speed difference between the ops `<=>` and `cmp`.	[reply] [d/l] [select]
Re^2: The High Price of Golf, and A Surprise by Aristotle (Chancellor) on Sep 07, 2002 at 11:25 UTC
Interesting: `$ perl -MO=Deparse,x7 -e'sort { $a cmp $b } @x' sort @x; -e syntax OK $ perl -MO=Deparse,x7 -e'sort { $b cmp $a } @x' sort @x; -e syntax OK $ perl -MO=Deparse,x7 -e'sort { $a <=> $b } @x' sort @x; -e syntax OK $ perl -MO=Deparse,x7 -e'sort { $b <=> $a } @x' sort @x; -e syntax OK` [download] So far it looks as though B::Deparse is broken. `$ perl -MO=Deparse,x7 -e'sort { $a->[0] cmp $b->[0] } @x' sort {$$a[0] cmp $$b[0];} @x; -e syntax OK` [download] Obviously, it is not. Looks rather like Perl has builtins for all the common cases, then. Makeshifts last the longest.	[reply] [d/l] [select]
Re*Re^2: The High Price of Golf, and A Surprise by bart (Canon) on Sep 07, 2002 at 11:45 UTC
That's very interesting. I had noticed the same thing. Your conclusion might very well be correct, and I hadn't even thought of that possibility. Could the next be a special common case, too? `perl -MO=Deparse,x7 -e'sort { $b cmp $a } @x' sort @x; -e syntax OK` [download] Well well well, it looks like it. It would explain why there appears to be no overhead for the callback. It can also explain the huge speed difference between "<=>" and "-": only the latter actually uses the callback. Otherwise, it would make no sense: "-" is a very fast operator, and checking the sign of the result should be just as fast. So it must be the overhead of the callback. Nevertheless, B::Deparse is broken, because the code it produces is obviously not equivalent to the original source code.	[reply] [d/l]
Re^4: The High Price of Golf, and A Surprise by Aristotle (Chancellor) on Sep 07, 2002 at 11:53 UTC
Re: Re*Re^2: The High Price of Golf, and A Surprise by hv (Prior) on Mar 04, 2003 at 00:27 UTC
Re:x3 The High Price of Golf, and A Surprise by grinder (Bishop) on Sep 07, 2002 at 21:15 UTC
`perldoc perldelta`, for 5.6.1 says the following about sort: Simple sort() using `{ $a <=> $b }` and the like are optimized Many common sort() operations using a simple inlined block are now optimized for faster performance. Mind you, I was under the impression that this optimisation occurred earlier, sometime back around the 5.004 to 5.005 timeframe. I'll have to go and look on some of my older machines. print@_{sort keys %_},$/if%_=split//,'= & *a?b:e\f/h^h!j+n,o@o;r$s-t%t#u'	[reply] [d/l]
Re: Re^2: The High Price of Golf, and A Surprise by broquaint (Abbot) on Sep 07, 2002 at 17:04 UTC
Small note which makes no difference to the deparsed syntax is that `sort()` used in a `void` context is a no-op (sp?). So all those `sort()`s could be deparsed down to `$ perl -MO=Deparse,x7 -e'sort { $b <=> $a } @x' -e syntax OK` [download] `_________ broquaint`	[reply] [d/l]
Re^4: The High Price of Golf, and A Surprise by Aristotle (Chancellor) on Sep 07, 2002 at 18:27 UTC
Re: The High Price of Golf, and A Surprise by ChemBoy (Priest) on Sep 07, 2002 at 23:55 UTC
My memory of this issue was that the unmodified sort was optimized in older versions of perl, but that recent editions had optimized all four of the common sorts (numeric and standard, forward and backwards). Fortunately (in some sense) I'm equipped to test this hypothesis, having some older installs to work with. I used bart's benchmarking code, for completeness. Read more... (10 kB) It would seem, if I'm reading this right, that the optimization of the basic sort in older versions is just barely enough to make it faster than numeric sort: now that the numeric sort is optimized as well, its inherent speed advantage shows. In 5.6.1, all of the sorts are faster, but the spaceship and cmp sorts improve by much more than the others, because the callback overhead referred to elswhere in this thread is removed. If God had meant us to fly, he would never have given us the railroads. --Michael Flanders	[reply] [d/l] [select]


Your skill will accomplish what the force of many cannot
	PerlMonks