in reply to Speed/Efficiency tweaks for a fannkuch benchmark script?

This tweaks the implementation rather than the algorithm, which means it comes into the realm of what are usually called micro-optimisations--but it does achieve a near 300% speedup:

#! perl -slw use strict; use Time::HiRes qw( gettimeofday tv_interval ); my $maxflips = 0; my @max_sequence; for my $num ( 1 .. 10 ) { my @start_time = gettimeofday(); @max_sequence = (); print "Pfannkuchen($num) = " . fannkuch( pack 'C*', 1 .. $num ) . +" for:"; print unpack 'C*', $_ for sort @max_sequence; my @end_time = gettimeofday(); print tv_interval ( \@start_time, \@end_time ), " elapsed seconds. +\n"; } sub fannkuch { my ( $a, $level ) = ( @_, 0 ); my ( $index, $ok, $copy, ) = ( $level, $level + 1 == length( $a +), $a ); do { if ($ok) { if( ord( $copy ) != 1 and ord( substr( $copy, -1 ) ) != length( $copy ) ) { my $q = $copy; my ( $k, $flips ); for ( $flips = 0; ( $k = ord( $q ) ) != 1; $flips++ ) +{ substr( $q, 0, $k ) = reverse substr( $q, 0, $k ); } if ( $flips > $maxflips ) { $maxflips = $flips; @max_sequence = (); } push @max_sequence, $copy if ( $maxflips == $flips ); } } else { fannkuch( $copy, 1 + $level ); } substr( $copy, $index - 1, 2 ) = reverse substr( $copy, $index + -1, 2 ); } while $index--; return $maxflips; } __END__ P:\test>513179-3 Pfannkuchen(1) = 0 for: 0.000368 elapsed seconds. Pfannkuchen(2) = 1 for: 21 0.000206 elapsed seconds. Pfannkuchen(3) = 2 for: 231 312 0.000298 elapsed seconds. Pfannkuchen(4) = 4 for: 2413 3142 0.000434 elapsed seconds. Pfannkuchen(5) = 7 for: 31452 0.001012 elapsed seconds. Pfannkuchen(6) = 10 for: 365142 415263 416523 456213 564132 0.005817 elapsed seconds. Pfannkuchen(7) = 16 for: 3146752 4762153 0.039062 elapsed seconds. Pfannkuchen(8) = 22 for: 61578324 0.339917 elapsed seconds. Pfannkuchen(9) = 30 for: 615972834 3.313848 elapsed seconds. Pfannkuchen(10) = 38 for: 59186210473 35.109115 elapsed seconds.

It does limit the algorithm to a maximum of 255 elements (without modifying it to go unicode), but I don't think anyone will be waiting around long enough to notice:)


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
  • Comment on Re: Speed/Efficiency tweaks for a fannkuch benchmark script? (300%)
  • Download Code

Replies are listed 'Best First'.
Re^2: Speed/Efficiency tweaks for a fannkuch benchmark script? (300%)
by thundergnat (Deacon) on Dec 01, 2005 at 19:45 UTC

    Cool! I had thought about trying to use strings instead of arrays but got hung up on how to handle multi-digit numbers. It didn't occur to me to use character ordinals instead.

Re^2: Speed/Efficiency tweaks for a fannkuch benchmark script? (300%)
by robin (Chaplain) on Dec 01, 2005 at 19:54 UTC
    That's great. I'm surprised it makes such a massive difference. I wonder whether the bottleneck in the original code was the @q = @$copy, because that's the only explanation I can think of for why this change gives such a big speed improvement.

      Far and away, by an order of magnitude, the most computationally expensive line is

      count wall-time cpu-time line# 22169434 157.7315 326.3050 26: @q[ 0 .. $k-1 ] = reverse @q[ 0 .. $k-1 ];

      Which is no surprise really since it is run an order of magnitude more times than any other line, but is also innocuously doing a lot of operations.

      1. Generating two lists;
      2. using those to slice across two arrays to produce two more lists (a & b);
      3. one of those lists (b) is then inverted to produce another list (c);
      4. and finally that list (c) is assigned the first list (a) (via aliases?) to complete the swap.

      Slices are a great notational convenience and what VHLLs are all about, but they do hide a deal of complexity.

      The other expensive lines in order of cost are:

      4500244 18.0823 53.2090 39: @copy[ $index - 1, $index ] = @copy[ $index, $index - 1 ] 4037913 10.6528 42.0950 22: if ( $copy[0] != 1 and $copy[-1] != @copy ) { 3265920 11.2764 37.4510 32: push @max_sequence, join '', @copy, "\n" 3265920 12.2326 37.4380 23: my @q = @copy;

      For comparison, here are the same lines from profiling the string version:

      22169434 96.3426 273.7840 28: substr( $q, 0, $k ) = reverse substr( $q, 0, $k ); 4500244 20.3415 55.0800 41: substr( $copy, $index - 1, 2 ) = reverse substr( $copy, $index + -1, 2 ); 4037913 11.3571 43.1890 22: if( ord( $copy ) != 1 3265920 8.6338 35.9450 25: my $q = $copy; 3265920 9.1315 34.8530 34: push @max_sequence, $copy

      They make it easy to see where the saving came from.

      I do love Devel::SmallProf. Line-by-line profiling is so much more useful that function-by-function. Of course, it does take an inordinately long time to run, hence the delay on my responding while I waited for the second profile to complete.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.