Re: Speed/Efficiency tweaks for a fannkuch benchmark script? (300%)

This tweaks the implementation rather than the algorithm, which means it comes into the realm of what are usually called micro-optimisations--but it does achieve a near 300% speedup:

#! perl -slw
use strict;
use Time::HiRes qw( gettimeofday tv_interval );

my $maxflips = 0;
my @max_sequence;

for my $num ( 1 .. 10 ) {
    my @start_time = gettimeofday();
    @max_sequence = ();
    print "Pfannkuchen($num) = " . fannkuch( pack 'C*', 1 .. $num ) . 
+" for:";
    print unpack 'C*', $_ for sort @max_sequence;
    my @end_time = gettimeofday();
    print tv_interval ( \@start_time, \@end_time ), " elapsed seconds.
+\n";
}

sub fannkuch {
    my ( $a, $level ) = ( @_, 0 );
    my ( $index, $ok, $copy,  ) = ( $level,  $level + 1 == length( $a 
+), $a );
    do {
        if ($ok) {
            if( ord( $copy ) != 1 
            and ord( substr( $copy, -1 ) ) != length( $copy ) 
            ) {
                my $q = $copy;
                my ( $k, $flips );
                for ( $flips = 0; ( $k = ord( $q ) ) != 1; $flips++ ) 
+{
                    substr( $q, 0, $k ) = reverse substr( $q, 0, $k );
                }
                if ( $flips > $maxflips ) {
                    $maxflips     = $flips;
                    @max_sequence = ();
                }
                push @max_sequence, $copy
                  if ( $maxflips == $flips );
            }
        }
        else {
            fannkuch( $copy, 1 + $level );
        }
        substr( $copy, $index - 1, 2 ) = reverse substr( $copy, $index
+ -1, 2 );

    } while $index--;
    return $maxflips;
}

__END__
P:\test>513179-3
Pfannkuchen(1) = 0 for:
0.000368 elapsed seconds.

Pfannkuchen(2) = 1 for:
21
0.000206 elapsed seconds.

Pfannkuchen(3) = 2 for:
231
312
0.000298 elapsed seconds.

Pfannkuchen(4) = 4 for:
2413
3142
0.000434 elapsed seconds.

Pfannkuchen(5) = 7 for:
31452
0.001012 elapsed seconds.

Pfannkuchen(6) = 10 for:
365142
415263
416523
456213
564132
0.005817 elapsed seconds.

Pfannkuchen(7) = 16 for:
3146752
4762153
0.039062 elapsed seconds.

Pfannkuchen(8) = 22 for:
61578324
0.339917 elapsed seconds.

Pfannkuchen(9) = 30 for:
615972834
3.313848 elapsed seconds.

Pfannkuchen(10) = 38 for:
59186210473
35.109115 elapsed seconds.
[download]

It does limit the algorithm to a maximum of 255 elements (without modifying it to go unicode), but I don't think anyone will be waiting around long enough to notice:)

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

Comment on Re: Speed/Efficiency tweaks for a fannkuch benchmark script? (300%) Download Code

Replies are listed 'Best First'.
Re^2: Speed/Efficiency tweaks for a fannkuch benchmark script? (300%) by thundergnat (Deacon) on Dec 01, 2005 at 19:45 UTC
Cool! I had thought about trying to use strings instead of arrays but got hung up on how to handle multi-digit numbers. It didn't occur to me to use character ordinals instead.	[reply]
Re^2: Speed/Efficiency tweaks for a fannkuch benchmark script? (300%) by robin (Chaplain) on Dec 01, 2005 at 19:54 UTC
That's great. I'm surprised it makes such a massive difference. I wonder whether the bottleneck in the original code was the `@q = @$copy`, because that's the only explanation I can think of for why this change gives such a big speed improvement.	[reply] [d/l]
Re^3: Speed/Efficiency tweaks for a fannkuch benchmark script? (300%) by BrowserUk (Patriarch) on Dec 01, 2005 at 23:47 UTC
Far and away, by an order of magnitude, the most computationally expensive line is `count wall-time cpu-time line# 22169434 157.7315 326.3050 26: @q[ 0 .. $k-1 ] = reverse @q[ 0 .. $k-1 ];` [download] Which is no surprise really since it is run an order of magnitude more times than any other line, but is also innocuously doing a lot of operations. Generating two lists; using those to slice across two arrays to produce two more lists (a & b); one of those lists (b) is then inverted to produce another list (c); and finally that list (c) is assigned the first list (a) (via aliases?) to complete the swap. Slices are a great notational convenience and what VHLLs are all about, but they do hide a deal of complexity. The other expensive lines in order of cost are: `4500244 18.0823 53.2090 39: @copy[ $index - 1, $index ] = @copy[ $index, $index - 1 ] 4037913 10.6528 42.0950 22: if ( $copy[0] != 1 and $copy[-1] != @copy ) { 3265920 11.2764 37.4510 32: push @max_sequence, join '', @copy, "\n" 3265920 12.2326 37.4380 23: my @q = @copy;` [download] For comparison, here are the same lines from profiling the string version: `22169434 96.3426 273.7840 28: substr( $q, 0, $k ) = reverse substr( $q, 0, $k ); 4500244 20.3415 55.0800 41: substr( $copy, $index - 1, 2 ) = reverse substr( $copy, $index + -1, 2 ); 4037913 11.3571 43.1890 22: if( ord( $copy ) != 1 3265920 8.6338 35.9450 25: my $q = $copy; 3265920 9.1315 34.8530 34: push @max_sequence, $copy` [download] They make it easy to see where the saving came from. I do love Devel::SmallProf. Line-by-line profiling is so much more useful that function-by-function. Of course, it does take an inordinately long time to run, hence the delay on my responding while I waited for the second profile to complete. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]