in reply to Re^2: Simple arithmetic? (And the winner is ... )
in thread Simple arithmetic?

Anonymonk's implementation by an order of magnitude over oiskuu's:

C:\test\C>gcm anonyM: gcm for s=2147483648 & r=1 to 1073741824 took: 33.607916866993 + oiskuu: gcm for s=2147483648 & r=1 to 1073741824 took:329.551997991197

Thanks for the sanity check AnonyMonk!

And the difference between this benchmark and the previous (unbelievable) one? One keyword:

void main( int argc, char **argv ) { U64 s = argc > 1 ? _atoi64( argv[ 1 ] ) : 2*GB; U64 r, start, end; U64 volatile gsm; // ^^^^^^^^ *** Prevent the co +mpiler from optimising the loop away in its entirety *** D'oh! start = __rdtsc(); for( r = 1; r < ( s >> 1 ); ++r ) { gsm = gcm( s, r ); } end = __rdtsc(); printf( "anonyM: gcm for s=%I64u & r=1 to %I64u took:%.12f [%I64u] +\n", s, s>>1, (double)( end - start ) / 2394046359.0 ); start = __rdtsc(); for( r = 1; r < ( s >> 1 ); ++r ) { gsm = s - ( s % lcm( r, 4096 ) ); } end = __rdtsc(); printf( "oiskuu: gcm for s=%I64u & r=1 to %I64u took:%.12f [%I64u] +\n", s, s>>1, (double)( end - start ) / 2394046359.0 ); return; }

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

Replies are listed 'Best First'.
Re^4: Simple arithmetic? (And the (real) winner is ... )
by Anonymous Monk on Mar 08, 2015 at 23:49 UTC
    Wow. At one point I almost suspected something like that, and even tried to change the order of algorithms, but... that didn't change anything (for some reason), and why the hell the compiler would optimize away just one loop? That's really obscure (**** compilers, how do they work? :)
      why the hell the compiler would optimize away just one loop? That's really obscure

      I have absolutely no idea. None. Zip. Nada!

      Whilst x64 asm is still fairly new to me, and the syntax and opcodes are sufficiently different from x86 to make it difficult to read at times -- especially with the interleaving of opcodes to keep the pipelines busy -- but I've been inspecting the asm output from compilers long enough now that I can usually postulate a reason why they optimise things in a particular way; but this has me stumped.

      Why for two essentially similar loops, one would get optimised away and the other not...


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
      In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked