Welcome to the Monastery PerlMonks

### Speed/Efficiency tweaks for a fannkuch benchmark script?

by thundergnat (Deacon)
 on Dec 01, 2005 at 03:41 UTC Need Help??

thundergnat has asked for the wisdom of the Perl Monks concerning the following question:

I was idly looking through The Computer Language Shootout Benchmarks and was mildly suprised at how slow many of the Perl implementations were in relation to other languages. In investigating more closely, it is apparent that most of the perl code there is basically C and Fortran written in Perl. (Without the benefit of inlining.) No wonder it doesn't fare very well.

In the implementation FAQ it states "Use the same algorithm and data structures.[In each implemantation.] As-far-as possible the languages should be doing the same operations. [...] The [...] programs often seem naive and unidiomatic."

It discourages using language specific idioms and clever programmer tricks to increase performance... What fun is that? ;-)

For amusements sake, I decided to try to rewrite one more efficiently, and selected the fannkuch benchmark.

The fannkuchs1 program should:

• "Take a permutation of {1,...,n}, for example: {4,2,1,5,3}.
• "Take the first element, here 4, and reverse the order of the first 4 elements: {5,1,2,4,3}.
• "Repeat this until the first element is a 1, so flipping won't change anything more: {3,4,2,1,5}, {2,4,3,1,5}, {4,2,3,1,5}, {1,3,2,4,5}.
• "Count the number of flips, here 5.
• "Do this for all n! permutations, and record the maximum number of flips needed for any permutation for n = 1..10.
• Write the maximum number of flips and the sequences that required them. (Note: this is different from the benchmark site. They want the first thirty permutations returned instead of the max flip sequences, which locks you into using a specific permutator and is less interesting. I think the actual sequences that require the maximum flips is more useful information.)

[1] Fannkuch is an abbreviation for the German word Pfannkuchen, or pancakes, in analogy to flipping pancakes.

The original, "C written in Perl" code, lightly modified to return the max sequences instead of the first 30 permutations and do some timings: (No warnings, no strict, and wouldn't pass them if they were there.)

```use Time::HiRes qw( gettimeofday tv_interval );

for my \$num(1..10){
my @start_time = gettimeofday();
print "Pfannkuchen(\$num) = ".fannkuch (\$num)." for:\n";
print sort @max_sequence;
my @end_time = gettimeofday();
print tv_interval ( \@start_time, \@end_time )," elapsed seconds.\n\n"
+;
};

sub fannkuch {
my \$n = shift;
my @p;
my @q;
my \$tmp;
my \$maxflips = 0;
my \$flips;

for (\$i=0; \$i < \$n; \$i++) { \$p[\$i] = 1 + \$i; }

BRK: for (;;) {

if (\$p[0] != 1) {

@q = @p;

for (\$flips = 0; (\$k = \$q[0]) != 1; \$flips++) {
for (\$k--,\$i=0; \$i < \$k; \$i++, \$k--) {
\$tmp = \$q[\$i];
\$q[\$i] = \$q[\$k];
\$q[\$k] = \$tmp;
}
}
if (\$flips > \$maxflips) {
\$maxflips = \$flips;
@max_sequence = ();
}
push @max_sequence, join '', @p,"\n" if (\$maxflips eq \$fli
+ps);
}

\$k = \$j = 0;
for (\$i=1; \$i < \$n; \$i++) {
\$j = \$i if (\$p[\$i-1] < \$p[\$i]);
\$k = \$i if (\$j && \$p[\$i] > \$p[\$j-1]);
}
last BRK if (!\$j);

\$tmp = \$p[\$j-1];
\$p[\$j-1] = \$p[\$k];
\$p[\$k] = \$tmp;

for (\$i=\$j,\$j=\$n-1; \$i < \$j; \$i++, \$j--) {
\$tmp = \$p[\$j];
\$p[\$j] = \$p[\$i];
\$p[\$i] = \$tmp;
}
}

return \$maxflips;
}

And here's my whack at idiomatic, strict and warnings clean, slightly more efficient code:

Permutation algorithm blatently stolen from robin's journal.

```use warnings;
use strict;
use Time::HiRes qw( gettimeofday tv_interval );

my \$maxflips = 0;
my @max_sequence;

for my \$num ( 1 .. 10 ) {
my @start_time = gettimeofday();
@max_sequence = ();
print "Pfannkuchen(\$num) = " . fannkuch( [ 1 .. \$num ] ) . " for:\
+n";
print sort @max_sequence;
my @end_time = gettimeofday();
print tv_interval ( \@start_time, \@end_time ), " elapsed seconds.
+\n\n";
}

sub fannkuch {
my ( \$aref, \$level ) = ( @_, 0 );
my ( \$index, \$copy, \$ok ) = ( \$level, [@\$aref], \$level + 1 == @\$ar
+ef );
do {
if (\$ok) {
if ( \$copy->[0] != 1 and \$copy->[-1] != @\$copy ) {
my @q = @\$copy;
#                my ( \$i, \$k, \$flips );
my ( \$k, \$flips );
for ( \$flips = 0 ; ( \$k = \$q[0] ) != 1 ; \$flips++ ) {
#                    for ( \$k--, \$i = 0 ; \$i < \$k ; \$i++, \$k-- ) {
#                       @q[ \$i, \$k ] = @q[ \$k, \$i ];
#                    }
@q[ 0 .. \$k-1 ] = reverse @q[ 0 .. \$k-1 ];
}
if ( \$flips > \$maxflips ) {
\$maxflips     = \$flips;
@max_sequence = ();
}
push @max_sequence, join '', @\$copy, "\n"
if ( \$maxflips == \$flips );
}
}
else {
fannkuch( \$copy, 1 + \$level );
}
@\$copy[ \$index - 1, \$index ] = @\$copy[ \$index, \$index - 1 ]
if \$index != 0;

} while \$index-- > 0;
return \$maxflips;
}
Update: Oops. Deleted a few extreaneous lines.
Update 2: Modified flipping algorithm to be more perlish. (Commented out original lines.)

This is nearly twice as fast as the original but still has room for improvement I'm sure. (My Perl skills are modest at best.) Can anyone suggest any other speed/efficiency tweaks? (Other than "Write it in C or Fortran". I know, if I am looking for raw performance, Perl is likely not the way to go. This is just a programming excercise.)

Replies are listed 'Best First'.
Re: Speed/Efficiency tweaks for a fannkuch benchmark script?
by robin (Chaplain) on Dec 01, 2005 at 13:34 UTC
That's a mighty good permutation algorithm you're using. I'm curious as to where you came across it. (A few years ago I did a fair bit of investigation into permutation generation, and this algorithm was the fastest pure-Perl one I could find by quite a margin. Your code looks remarkably similar to mine!)

It may also be worth looking at the algorithm used in tye's Algorithm::Loops, which is jolly clever. I haven't compared it directly with this one, though if I had to guess, I would guess this one is probably still faster.

PS. I still haven't figured out how you can get away with doing nothing when \$copy->[-1] == @\$copy. What's the trick?

Update: Okay, I've got it. That's pretty clever. Here's my proof that it works.

Lemma: for every \$n ≥ 1, there is some permutation of (1..\$n+1) that takes more flips than any permutation of (1..\$n).
Proof. Let @a be a permutation of (1..\$n) that takes as many flips as possible, say it takes \$flip flips; and consider the permutation (\$n+1, reverse @a). Clearly this takes (\$flip+1) flips.

Now, if \$copy->[-1] == @\$copy then the last element can never be moved by the flipping, and so this will take the same number of flips as @\$copy[0..\$#\$copy-1]. By the Lemma, this is less than the maximum number of flips.

That's a mighty good permutation algorithm you're using. I'm curious as to where you came across it. (A few years ago I did a fair bit of investigation into permutation generation, and this algorithm was the fastest pure-Perl one I could find by quite a margin. Your code looks remarkably similar to mine!)
That was a subroutine I had found on the web a while ago when I was looking for how to do fast permutations. I didn't really remember exactly where I got it, (I had stored it away in my folder of cool and interesting perl stuff,) but in looking at your web page, there is little doubt that it was from there. Sorry for not attributing it correctly, and thank you for making it available.
No need to be sorry. I'm glad you found it useful!

In any case, I got the algorithm from a C program posted to alt.sources in 1990 by Matt Day, who I haven't been able to track down.

I noticed that your fast permutation algorithm passes an arrayref, but then copies the values out. There's no particular savings there compared to passing the array. That and a few other micro-optimizations squeezed 210+% more speed out (if you don't print — the savings are probably less significant with printing turned on.) Code (update: fixed, reducing savings, as expected) follows.

Caution: Contents may have been coded under pressure.
Interesting, but broken. (Uncomment the print and you'll see what I mean!)

You can fix it by adding

```  local @_ = @_;
to the beginning of the sub, but that presumably makes it slower again.
Re: Speed/Efficiency tweaks for a fannkuch benchmark script? (300%)
by BrowserUk (Patriarch) on Dec 01, 2005 at 18:57 UTC

This tweaks the implementation rather than the algorithm, which means it comes into the realm of what are usually called micro-optimisations--but it does achieve a near 300% speedup:

```#! perl -slw
use strict;
use Time::HiRes qw( gettimeofday tv_interval );

my \$maxflips = 0;
my @max_sequence;

for my \$num ( 1 .. 10 ) {
my @start_time = gettimeofday();
@max_sequence = ();
print "Pfannkuchen(\$num) = " . fannkuch( pack 'C*', 1 .. \$num ) .
+" for:";
print unpack 'C*', \$_ for sort @max_sequence;
my @end_time = gettimeofday();
print tv_interval ( \@start_time, \@end_time ), " elapsed seconds.
+\n";
}

sub fannkuch {
my ( \$a, \$level ) = ( @_, 0 );
my ( \$index, \$ok, \$copy,  ) = ( \$level,  \$level + 1 == length( \$a
+), \$a );
do {
if (\$ok) {
if( ord( \$copy ) != 1
and ord( substr( \$copy, -1 ) ) != length( \$copy )
) {
my \$q = \$copy;
my ( \$k, \$flips );
for ( \$flips = 0; ( \$k = ord( \$q ) ) != 1; \$flips++ )
+{
substr( \$q, 0, \$k ) = reverse substr( \$q, 0, \$k );
}
if ( \$flips > \$maxflips ) {
\$maxflips     = \$flips;
@max_sequence = ();
}
push @max_sequence, \$copy
if ( \$maxflips == \$flips );
}
}
else {
fannkuch( \$copy, 1 + \$level );
}
substr( \$copy, \$index - 1, 2 ) = reverse substr( \$copy, \$index
+ -1, 2 );

} while \$index--;
return \$maxflips;
}

__END__
P:\test>513179-3
Pfannkuchen(1) = 0 for:
0.000368 elapsed seconds.

Pfannkuchen(2) = 1 for:
21
0.000206 elapsed seconds.

Pfannkuchen(3) = 2 for:
231
312
0.000298 elapsed seconds.

Pfannkuchen(4) = 4 for:
2413
3142
0.000434 elapsed seconds.

Pfannkuchen(5) = 7 for:
31452
0.001012 elapsed seconds.

Pfannkuchen(6) = 10 for:
365142
415263
416523
456213
564132
0.005817 elapsed seconds.

Pfannkuchen(7) = 16 for:
3146752
4762153
0.039062 elapsed seconds.

Pfannkuchen(8) = 22 for:
61578324
0.339917 elapsed seconds.

Pfannkuchen(9) = 30 for:
615972834
3.313848 elapsed seconds.

Pfannkuchen(10) = 38 for:
59186210473
35.109115 elapsed seconds.

It does limit the algorithm to a maximum of 255 elements (without modifying it to go unicode), but I don't think anyone will be waiting around long enough to notice:)

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Cool! I had thought about trying to use strings instead of arrays but got hung up on how to handle multi-digit numbers. It didn't occur to me to use character ordinals instead.

That's great. I'm surprised it makes such a massive difference. I wonder whether the bottleneck in the original code was the @q = @\$copy, because that's the only explanation I can think of for why this change gives such a big speed improvement.

Far and away, by an order of magnitude, the most computationally expensive line is

``` count     wall-time cpu-time line#
22169434  157.7315  326.3050    26:
@q[ 0 .. \$k-1 ] = reverse @q[ 0 .. \$k-1 ];

Which is no surprise really since it is run an order of magnitude more times than any other line, but is also innocuously doing a lot of operations.

1. Generating two lists;
2. using those to slice across two arrays to produce two more lists (a & b);
3. one of those lists (b) is then inverted to produce another list (c);
4. and finally that list (c) is assigned the first list (a) (via aliases?) to complete the swap.

Slices are a great notational convenience and what VHLLs are all about, but they do hide a deal of complexity.

The other expensive lines in order of cost are:

```  4500244   18.0823   53.2090    39:
@copy[ \$index - 1, \$index ] = @copy[ \$index, \$index - 1 ]
4037913   10.6528   42.0950    22:
if ( \$copy[0] != 1 and \$copy[-1] != @copy ) {
3265920   11.2764   37.4510    32:
push @max_sequence, join '', @copy, "\n"
3265920   12.2326   37.4380    23:
my @q = @copy;

For comparison, here are the same lines from profiling the string version:

``` 22169434   96.3426  273.7840    28:
substr( \$q, 0, \$k ) = reverse substr( \$q, 0, \$k );
4500244   20.3415   55.0800    41:
substr( \$copy, \$index - 1, 2 ) = reverse substr( \$copy, \$index
+ -1, 2 );
4037913   11.3571   43.1890    22:
if( ord( \$copy ) != 1
3265920    8.6338   35.9450    25:
my \$q = \$copy;
3265920    9.1315   34.8530    34:
push @max_sequence, \$copy

They make it easy to see where the saving came from.

I do love Devel::SmallProf. Line-by-line profiling is so much more useful that function-by-function. Of course, it does take an inordinately long time to run, hence the delay on my responding while I waited for the second profile to complete.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Speed/Efficiency tweaks for a fannkuch benchmark script?
by EdwardG (Vicar) on Dec 01, 2005 at 12:04 UTC
Indeed. It's like comparing a manual screwdriver to an electric screwdriver. The manual screwdriver wins until you pull the trigger...

Re: Speed/Efficiency tweaks for a fannkuch benchmark script?
by robin (Chaplain) on Dec 01, 2005 at 15:27 UTC
Taking your clever precheck a stage further gives a 50%-ish speed improvement on my iBook:
(Update: On a faster Linux PC the improvement is much less pronounced: only about 8–10%.)
```use List::Util qw( min max );
sub fannkuch {
my ( \$aref, \$level ) = ( @_, 0 );
my ( \$index, \$copy, \$ok ) = ( \$level, [@\$aref], \$level + 1 == @\$ar
+ef );
do {
if (\$ok) {
if (max(@\$copy[0..(\$copy->[0] - 1)]) != \$copy->[0]
&& min(@\$copy[(\$copy->[-1] - 1)..\$#\$copy]) != \$copy->[-1
+]) {

my @q = @\$copy;
my ( \$k, \$flips );
for ( \$flips = 0 ; ( \$k = \$q[0] ) != 1 ; \$flips++ ) {
@q[ 0 .. \$k-1 ] = reverse @q[ 0 .. \$k-1 ];
}
if ( \$flips > \$maxflips ) {
\$maxflips     = \$flips;
@max_sequence = ();
}
push @max_sequence, join '', @\$copy, "\n"
if ( \$maxflips == \$flips );
}
}
else {
fannkuch( \$copy, 1 + \$level );
}
@\$copy[ \$index - 1, \$index ] = @\$copy[ \$index, \$index - 1 ]
if \$index != 0;

} while \$index-- > 0;
return \$maxflips;
}
Incidentally, I tried changing the code to use Algorithm::FastPermute (which implements the same permutation algorithm in C) and the runtime actually increased. I don't know why that's happening. It may be an unfortunate side-effect of the stability improvements in my latest version of A::FP, or it may be something else entirely. Update: no, it's nothing to do with the recent changes. I get the same result using an old version too.

Ah. Very clever. An obvious extention of the prechecks I was doing, but I did't find a way to implement them without adding more overhead than I was saving.

Nice!

Hmm, the only problem with this is that my optimised routine gives the wrong answer. The bug is that, if the largest element is at the beginning or the smallest is at the end of the list, my test will skip it when it shouldn't. So the condition really needs to be something like:
```            my (\$n, \$first, \$last) = (\$#\$copy, @\$copy[0,-1]);
if ( ( \$first == \$n+1 || \$first != max(@\$copy[0..(\$first -
+ 1)]) )
&& ( \$last == 1 || \$last != min(@\$copy[(\$last - 1)..\$n])
+ ))
{
...
Now it's a little faster than the original on my laptop, and a little slower on the Linux box.

Yes, but... The only case where it would yeild a wrong answer is when \$n == 2. So just check for that.

```if ((max( @\$copy[ 0 .. ( \$copy->[0] - 1 ) ] ) != \$copy->[0]
&& min( @\$copy[ ( \$copy->[-1] - 1 ) .. \$#\$copy ] ) != \$copy->[-1])
|| @\$copy == 2 )
{

Or am I missing something?

Re: Speed/Efficiency tweaks for a fannkuch benchmark script?
by robin (Chaplain) on Dec 01, 2005 at 23:35 UTC
I can obtain a modest (< 10%) but consistent speedup by modifying the permutation generator so that it doesn't even generate sequences that would fail the extended precheck. Like this:
```sub fannkuch {
my ( \$a, \$level, \$split ) = ( @_, 0, 1 );
my ( \$index, \$ok, \$copy ) = ( \$level, \$level + 1 == length( \$a ),
+\$a );

if (\$ok) {
# print "Before munging: (\$split) ", unpack('C*', \$copy);
\$index = \$split - 1;
substr(\$copy, \$index, 0) = chop(\$copy);
}

do {
if (\$ok) {
# print "(\$split) ", unpack('C*', \$copy);
my \$q = \$copy;
my ( \$k, \$flips );
for ( \$flips = 0; ( \$k = ord( \$q ) ) != 1; \$flips++ ) {
substr( \$q, 0, \$k ) = reverse substr( \$q, 0, \$k );
}
if ( \$flips >= \$maxflips ) {
if ( \$flips == \$maxflips) {
push @max_sequence, \$copy;
}
else {
\$maxflips     = \$flips;
@max_sequence = (\$copy);
}
}
}
else {
fannkuch( \$copy, 1 + \$level, \$split );
\$split = \$level + 1 if \$index == \$split;
}
substr( \$copy, \$index - 1, 2 ) = reverse substr( \$copy, \$index
+ -1, 2 );

} while \$index--;
return \$maxflips;
}
The idea is that \$split is the smallest positive number for which all the elements in substr(\$copy, 0, \$split) are less than or equal to \$split.

This table shows a few examples:
sequence \$split
153421
213452
231453
234154
314254
315425
(The commented-out debugging statements may make it easier to follow what's going on. They certainly helped me to understand it as I was writing it.)

On each top-level run (i.e. where \$ok is true) we can skip directly to the first permutation where the moving element (which starts out as the last one in the list) has passed the split-point.

I've also reordered the maxflips logic, which gives an additional small improvement.

On an unloaded Linux PC, I get the following results. This is the output from diff --side-by-side -W 80, with the output from BrowserUk's code on the left, and mine on the right:

```[rpc142: /tmp]\$ diff --side-by-side -W 80 fann-char-orig.out fann-char
+.out
Pfannkuchen(1) = 0 for:                 Pfannkuchen(1) = 0 for:
8.7e-05 elapsed seconds.              | 1
> 0.000118 elapsed seconds.

Pfannkuchen(2) = 1 for:                 Pfannkuchen(2) = 1 for:
21                                      21
7.7e-05 elapsed seconds.              | 6.4e-05 elapsed seconds.

Pfannkuchen(3) = 2 for:                 Pfannkuchen(3) = 2 for:
231                                     231
312                                     312
0.000134 elapsed seconds.             | 0.000133 elapsed seconds.

Pfannkuchen(4) = 4 for:                 Pfannkuchen(4) = 4 for:
2413                                    2413
3142                                    3142
0.000307 elapsed seconds.             | 0.000281 elapsed seconds.

Pfannkuchen(5) = 7 for:                 Pfannkuchen(5) = 7 for:
31452                                   31452
0.001292 elapsed seconds.             | 0.001164 elapsed seconds.

Pfannkuchen(6) = 10 for:                Pfannkuchen(6) = 10 for:
365142                                  365142
415263                                  415263
416523                                  416523
456213                                  456213
564132                                  564132
0.008305 elapsed seconds.             | 0.007341 elapsed seconds.

Pfannkuchen(7) = 16 for:                Pfannkuchen(7) = 16 for:
3146752                                 3146752
4762153                                 4762153
0.062369 elapsed seconds.             | 0.056875 elapsed seconds.

Pfannkuchen(8) = 22 for:                Pfannkuchen(8) = 22 for:
61578324                                61578324
0.541676 elapsed seconds.             | 0.498606 elapsed seconds.

Pfannkuchen(9) = 30 for:                Pfannkuchen(9) = 30 for:
615972834                               615972834
5.347029 elapsed seconds.             | 4.943895 elapsed seconds.

Pfannkuchen(10) = 38 for:               Pfannkuchen(10) = 38 for:
59186210473                             59186210473
58.118665 elapsed seconds.            | 54.24672 elapsed seconds.

Some more micro-optimization, proably getting into foolish optimization at this point. Reduced unnecesary copying of variables, performing some calculations used many times outside the loop, added a special check for sequences that start with n.

The line checking for sequences starting with n needs some explanation. It is probably the most obscure.

There are two givens for this algorithm.

• If (\$next == \$length) then (\$level + 1 == \$length).
• \$maxflips DOES NOT get reset between runs.

So: if the first character is n then the minimum \$maxflips{n} is \$maxflips{n-1} + 1. It is impossible to have a sequence to take less than length - 1 flips, so if \$length - 1 is less than \$maxflips{n-1} then it impossible for a sequence that starts with n to have more than \$maxflips{n-1} + 1. Therefore, no need to check them.

Rather than compare \$length to \$maxflips + 1, since \$level == \$length - 1, I save the calculation in the loop and compare \$level to \$maxflips.

```sub fannkuch {
my ( \$copy, \$level, \$split ) = ( @_, 0, 1 );
my ( \$index, \$next ) = ( \$level, \$level + 1 );
my \$length = length(\$copy);

if (\$next == \$length) {
(\$index, \$split) = (\$split - 1, \$level);
substr(\$copy, \$index, 0) = chop(\$copy);
}
do {
if (\$next == \$length) {
unless ( ord(\$copy) == \$length and \$level < \$maxflips )
{
my \$q = \$copy;
my ( \$k, \$flips );
for ( \$flips = 0; ( \$k = ord( \$q ) ) != 1; \$flips++ )
+{
substr( \$q, 0, \$k ) = reverse substr( \$q, 0, \$k );
}
if ( \$flips >= \$maxflips ) {
if ( \$flips == \$maxflips) {
push @max_sequence, \$copy;
}
else {
\$maxflips     = \$flips;
@max_sequence = (\$copy);
}
}
}
}
else {
fannkuch( \$copy, \$next, \$split );
}
substr( \$copy, \$index - 1, 2 ) = reverse substr( \$copy, \$index
+ - 1, 2 );

\$split = \$next if \$index == \$split;
} while \$index--;
return \$maxflips;
}

Side by side with your code: It is only about a 4-6% increase, but it is repeatable.

```Your sub                          My sub

Pfannkuchen(1) = 0 for:            Pfannkuchen(1) = 0 for:
1                                  1
0.000351 elapsed seconds.          0.000274 elapsed seconds.

Pfannkuchen(2) = 1 for:            Pfannkuchen(2) = 1 for:
21                                 21
0.000196 elapsed seconds.          0.00016 elapsed seconds.

Pfannkuchen(3) = 2 for:            Pfannkuchen(3) = 2 for:
231                                231
312                                312
0.000275 elapsed seconds.          0.000225 elapsed seconds.

Pfannkuchen(4) = 4 for:            Pfannkuchen(4) = 4 for:
2413                               2413
3142                               3142
0.000348 elapsed seconds.          0.000288 elapsed seconds.

Pfannkuchen(5) = 7 for:            Pfannkuchen(5) = 7 for:
31452                              31452
0.000701 elapsed seconds.          0.000634 elapsed seconds.

Pfannkuchen(6) = 10 for:           Pfannkuchen(6) = 10 for:
365142                             365142
415263                             415263
416523                             416523
456213                             456213
564132                             564132
0.00376 elapsed seconds.           0.003357 elapsed seconds.

Pfannkuchen(7) = 16 for:           Pfannkuchen(7) = 16 for:
3146752                            3146752
4762153                            4762153
0.025629 elapsed seconds.          0.023704 elapsed seconds.

Pfannkuchen(8) = 22 for:           Pfannkuchen(8) = 22 for:
61578324                           61578324
0.226504 elapsed seconds.          0.209445 elapsed seconds.

Pfannkuchen(9) = 30 for:           Pfannkuchen(9) = 30 for:
615972834                          615972834
2.209246 elapsed seconds.          2.088572 elapsed seconds.

Pfannkuchen(10) = 38 for:          Pfannkuchen(10) = 38 for:
59186210473                        59186210473
24.218115 elapsed seconds.         23.077981 elapsed seconds.
I made a couple of small improvements to my code a few minutes after posting:
• removed an unnecessary assignment to \$split;
• moved the other \$split assignment into the else clause.
Your code is based on the original. Sorry I didn't flag the update, I didn't think anyone would have already d/led the code so quickly!
Re: Speed/Efficiency tweaks for a fannkuch benchmark script?
by thundergnat (Deacon) on Dec 02, 2005 at 02:31 UTC

Veering off in a somewhat random direction... While playing around with this code and the Pfannkuchen(n) for n = 1 .. 10, I noticed something unique about the sequence 416523, which is one of the max sequences for Pfannkuchen(6).

Re: Speed/Efficiency tweaks for a fannkuch benchmark script?
by robin (Chaplain) on Dec 02, 2005 at 12:37 UTC
I can squeeze another 20% or so out of it – actually 23% on my systems – with some more micro-optimizations. The most significant change is in the inner loop, where I’ve changed
```  substr( \$q, 0, \$k ) = reverse substr( \$q, 0, \$k );
to
```  \$q = reverse(substr(\$q, 0, \$k)) . substr(\$q, \$k);
Here's the code:
```sub fannkuch {
use bytes;    # This makes it fractionally faster
my ( \$copy, \$level, \$split ) = ( @_, 0, 1 );
my ( \$index, \$next, \$length ) = ( \$level, \$level + 1, length( \$cop
+y ) );

if (\$next == \$length) {
\$index = \$split - 1;
substr(\$copy, \$index, 0) = chop(\$copy);
}

my ( \$q, \$k );
do {
if (\$next == \$length) {
if ((\$k = ord(\$q = \$copy)) != \$length
||  \$level >= \$maxflips)
{
# Declaring \$flips in here means we can reset it
# with a single op (compared with the three you
# need for C<\$flips = 0>).
my \$flips;

# This is a touch faster than a "proper" loop,
# because it doesn't push a new context.
\$q = reverse(substr( \$q, 0, \$k )) . substr(\$q, \$k),
++\$flips
while (\$k=ord(\$q)) != 1;

no warnings "uninitialized"; # \$flips may be undef
if ( \$flips >= \$maxflips ) {
if ( \$flips == \$maxflips) {
push @max_sequence, \$copy;
}
else {
(\$maxflips, @max_sequence) = (\$flips, \$copy);
}
}
}
}
else {
fannkuch( \$copy, \$next, \$split );
\$split = \$next if \$index == \$split;
}
substr(\$copy, \$index-1, 2) = reverse substr(\$copy, \$index-1, 2
+);

} while \$index--;
\$maxflips;  # faster than an explicit return
}

Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://513179]
Approved by GrandFather
Front-paged by sk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (4)
As of 2024-02-24 09:46 GMT
Voting Booth?
My favourite way to spend a leap day ...

Results (22 votes). Check out past polls.