Re^7: [OT] Swapping buffers in place. (Final summation.)

I've been quite entertained by the whole thread all weekend- it's been my diversion from wrestling with getting a new little computer to talk to various data acq devices using little C snippets to do the direct talking and wrapping Perl around it to make it faster to code up and modify. Fortunately I have most of it wired up and set for remote access, so I could work from the comfort of my couch much of the time. I'm remembering how much C I've forgotten...

The algorithm that surprises me the most is the recursive, in that you didn't end up with a lot of stack overhead slowing it down too much or hanging it up-- even just pushing return addresses on the stack and no data it's going to get large for your datasets. Once you showed the manual swaps, I was sure iterative was going to be the answer.

Most of what I thought the reversing algorithm had going for it was a) simple to code, b) sure to work without debugging, and c) you can probably fit it in about 20 bytes of code if you get sent back to 1983.

Comment on Re^7: [OT] Swapping buffers in place. (Final summation.)

Replies are listed 'Best First'.
Re^8: [OT] Swapping buffers in place. (Final summation.) by BrowserUk (Patriarch) on Mar 02, 2015 at 19:38 UTC
The algorithm that surprises me the most is the recursive, in that you didn't end up with a lot of stack overhead slowing it down too much or hanging it up-- even just pushing return addresses on the stack and no data it's going to get large for your datasets. I did try to find a pathological case for the recursive version. Using a simple sub it is easy to see the steps it goes through for a particular set of parameters: `[0] Perl> sub steps { my($i,$m,$o) = (0,@_); print( "$i: $m $o" ), ++$ +i,( $m >= $o ? ($m -= $o) : ($o -= $m )) while $m and $o; };; [0] Perl> steps( 9, 6 );; 0: 9 6 1: 3 6 2: 3 3 [0] Perl> steps( 17, 9 );; 0: 17 9 1: 8 9 2: 8 1 3: 7 1 4: 6 1 5: 5 1 6: 4 1 7: 3 1 8: 2 1 9: 1 1` [download] It pathological case is when there is a difference of just 1 element between the two buffers. The first step moves the smaller buffer into its final position in the one go; but then the odd byte has to be 'rippled' through the rest of the larger buffer to get it (and the rest of the larger buffer) into their final positions. So then I tried running it with 2^29 2^28-1 (but turn of the reporting and output just the final number of steps: `[0] Perl> sub steps { my($i,$m,$o) = (0,@_); ++$i,( $m >= $o ? ($m -= +$o) : ($o -= $m )) while $m and $o; print $i; };; Subroutine steps redefined at (eval 17) line 1, <STDIN> line 9. [0] Perl> steps( 229, 228-1 );; 134217731` [download] 134 million steps, with all but one moving 1 byte one place at a time. The prospects didn't look good. As you say, that'd involve 134 million 8-byte return addresses on the stack. Except it didn't. I saw no memory growth at all. Which could only mean that the compiler had tail-call optimised the recursion way. And sure enough, looking at the asm it has. It also eliminated the duplicated `y == size` comparison: <Reveal this spoiler or all in this thread> So, when I fed those parameters into the real code: C:\test\C>bufswap 536870912 268435455 2 ### 2^29 2^28-1 size:536870912 offset;268435455 [ 0 1 ... 268435453 268435454 ^ 268435455 268 +435456 ... 536870910 536870911 ] [ 268435455 268435456 ... 536870910 536870911 ^ 0 + 1 ... 268435453 268435454 ] iterative: swaps:536870912 took 7.359985176 secs. [ 0 1 ... 268435453 268435454 ^ 268435455 268 +435456 ... 536870910 536870911 ] [ 268435455 268435456 ... 536870910 536870911 ^ 0 + 1 ... 268435453 268435454 ] recursive: swaps:536870911 took 3.762964774 secs. [ 0 1 ... 268435453 268435454 ^ 268435455 268 +435456 ... 536870910 536870911 ] [ 268435455 268435456 ... 536870910 536870911 ^ 0 + 1 ... 268435453 268435454 ] reversive: swaps:536870911 took 4.901475821 secs. [download] Nada! No pathological behaviour. That one-at-a-time ripple may look/sound laborious, but its basically a single run through memory, like copying a string, that the hardware and caches are designed to optimise for. Hence why I never bothered to test the iterative version of the algorithm that anonymonk posted above. The compiler made a better job of the conversion. (Besides, then I wouldn't have been able to call it the recursive algorithm; and I so like my 'iterative'/'recursive'/'reversive' labels :) Most of what I thought the reversing algorithm had going for it was a) simple to code, b) sure to work without debugging, and c) you can probably fit it in about 20 bytes of code if you get sent back to 1983. :) There is definitely something to be said for simple! With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^8: [OT] Swapping buffers in place. (Final summation.)
by BrowserUk (Patriarch) on Mar 02, 2015 at 19:38 UTC

The algorithm that surprises me the most is the recursive, in that you didn't end up with a lot of stack overhead slowing it down too much or hanging it up-- even just pushing return addresses on the stack and no data it's going to get large for your datasets.

I did try to find a pathological case for the recursive version. Using a simple sub it is easy to see the steps it goes through for a particular set of parameters:

[0] Perl> sub steps { my($i,$m,$o) = (0,@_); print( "$i: $m $o" ), ++$
+i,( $m >= $o ? ($m -= $o) : ($o -= $m )) while $m and $o; };;

[0] Perl> steps( 9, 6 );;
0: 9 6
1: 3 6
2: 3 3

[0] Perl> steps( 17, 9 );;
0: 17 9
1: 8 9
2: 8 1
3: 7 1
4: 6 1
5: 5 1
6: 4 1
7: 3 1
8: 2 1
9: 1 1
[download]

It pathological case is when there is a difference of just 1 element between the two buffers. The first step moves the smaller buffer into its final position in the one go; but then the odd byte has to be 'rippled' through the rest of the larger buffer to get it (and the rest of the larger buffer) into their final positions.

So then I tried running it with 2^29 2^28-1 (but turn of the reporting and output just the final number of steps:

[0] Perl> sub steps { my($i,$m,$o) = (0,@_); ++$i,( $m >= $o ? ($m -= 
+$o) : ($o -= $m )) while $m and $o; print $i; };;
Subroutine steps redefined at (eval 17) line 1, <STDIN> line 9.

[0] Perl> steps( 2**29, 2**28-1 );;
134217731
[download]

134 million steps, with all but one moving 1 byte one place at a time. The prospects didn't look good. As you say, that'd involve 134 million 8-byte return addresses on the stack. Except it didn't. I saw no memory growth at all. Which could only mean that the compiler had tail-call optimised the recursion way. And sure enough, looking at the asm it has. It also eliminated the duplicated y == size comparison:

So, when I fed those parameters into the real code:

C:\test\C>bufswap 536870912 268435455 2  ### 2^29 2^28-1
size:536870912 offset;268435455
[         0          1 ...  268435453    268435454 ^  268435455    268
+435456   ...  536870910  536870911 ]
[ 268435455  268435456 ...  536870910    536870911 ^          0       
+     1   ...  268435453  268435454 ]
iterative: swaps:536870912 took 7.359985176 secs.

[         0          1 ...  268435453    268435454 ^  268435455    268
+435456   ...  536870910  536870911 ]
[ 268435455  268435456 ...  536870910    536870911 ^          0       
+     1   ...  268435453  268435454 ]
recursive: swaps:536870911 took 3.762964774 secs.

[         0          1 ...  268435453    268435454 ^  268435455    268
+435456   ...  536870910  536870911 ]
[ 268435455  268435456 ...  536870910    536870911 ^          0       
+     1   ...  268435453  268435454 ]
reversive: swaps:536870911 took 4.901475821 secs.
[download]

Nada! No pathological behaviour. That one-at-a-time ripple may look/sound laborious, but its basically a single run through memory, like copying a string, that the hardware and caches are designed to optimise for. Hence why I never bothered to test the iterative version of the algorithm that anonymonk posted above. The compiler made a better job of the conversion. (Besides, then I wouldn't have been able to call it the recursive algorithm; and I so like my 'iterative'/'recursive'/'reversive' labels :)

Most of what I thought the reversing algorithm had going for it was a) simple to code, b) sure to work without debugging, and c) you can probably fit it in about 20 bytes of code if you get sent back to 1983.

:) There is definitely something to be said for simple!

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority". I'm with torvalds on this

In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

[reply]
[d/l]
[select]