comment on

Well that's a weird result BUK :) So I actually compiled your code (with some changes to make it suitable for gcc on linux). I was getting strange results too. 1500 nanoseconds for anonM algorithm, no matter what s was (at -O2). I'm curious why that happened. With some checksums to verify results the execution time became more realistic (with 1:10 bias towards bitshift stuff,

I understand your reaction. I had the same. I spent 4 hours trying to find something amiss. I failed.

If you look at the commented out verification code in what I posted above (with extra comments):

void main( int argc, char **argv ) {
    U64 s = argc > 1 ? _atoi64( argv[ 1 ] ) : 2*GB;
    U64 r, start, end, gsm;
//    U64 *res = malloc( sizeof(U64) * (s>>1) );                      
+  //// An array to store every result from loop 1 (anonyM)

    start = __rdtsc();
    for( r = 1; r < ( s >> 1 ); ++r ) {
//        res[ r ] =                                                  
+  //// Assign the result to the array
        gsm = gcm( s, r );
    }
    end = __rdtsc();
    printf( "anonyM: gcm for s=%I64u & r=1 to %I64u took:%.9f\n", s, s
+>>1, (double)( end - start ) / 2394046359.0 );

    start = __rdtsc();
    for( r = 1; r < ( s >> 1 ); ++r ) {
//        if( res[ r ] !=                                             
+ //// If the result from the first loop, doesn't match the result fro
+m this loop
            ( gsm = s - ( s % lcm( r, 4096 ) ) )
//        ) printf( "%I64u != %I64u\n", gsm, res[ r ] )               
+ ///// Yell about it.
        ;
    }
    end = __rdtsc();
    printf( "oiskuu: gcm for s=%I64u & r=1 to %I64u took:%.9f\n", s, s
+>>1, (double)( end - start ) / 2394046359.0 );

    return;
}
[download]

Of course, the additional code slows things down and changes the timing completely, but not a single mismatch is found.

I can't think of a more comprehensive way to verify the algorithms than to directly compare their results for a full range of inputs?

but perhaps your compiler recognizes Euclid's algorithm?

I'll admit that I do not understand quite what the optimiser has done to the code.

The Euclidean gcd() function is pretty easy to recognise in the assembler output:

_TEXT    SEGMENT
a$ = 48
b$ = 56
gcd    PROC

; 5    : U64 gcd( U64 a, U64 b ) {                      ; a in rcx, b 
+in rdx

$LN5:
    sub    rsp, 40                    ; reserve some stack space
    mov    r8, rdx                                 ; put a copy of b i
+n r8

; 6    :     return !b ? a : gcd( b, a % b );

    mov    rax, rcx                                ; put a in rax (res
+ult register)
    test    rdx, rdx                                ; test b
    je    SHORT $LN4@gcd                          ; jump to return if 
+it is 0; ie. if b=0 return a
    xor    edx, edx                                ; empty rdx ready f
+or div
    mov    rcx, r8                                 ; move b into rcx (
+ ie. b becomes a for recursive call )
    div    r8                                      ; divide rdx:rax (0
+:a) by r8 (b); leaves quotient in rax, the remainder (modulus) in rdx
+ (thus b in the recursive call)
    call    gcd                                     ; recurse
$LN4@gcd:
                                                        ; result is in
+ rax.
; 7    : }

    add    rsp, 40                    ; clean up the stack
    ret    0                                       ; return popping 0 
+bytes from the stack
gcd    ENDP
_TEXT    ENDS
[download]

~~Which is well and good, but the the code in the loop that (should be) calling this function is very confusing~~

Realisation dawns!

Without the verification code, the compiler recognises that the value calculated within the loop is never used outside of it, and simply optimises the entire loop away:

; 35   :     start = __rdtsc();

    rdtsc
    shl    rdx, 32                    ; 00000020H
    or    rax, rdx
    mov    rcx, rax

; 36   :     for( r = 1; r < ( s >> 1 ); ++r ) {
; 37   :             ( gsm = s - ( s % lcm( r, 4096 ) ) );
; 38   :     }
; 39   :     end = __rdtsc();

    rdtsc
    shl    rdx, 32                    ; 00000020H
[download]

In other words, the times output are the time it takes to make two calls to _rdtsc().

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority". I'm with torvalds on this

In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

In reply to Re^3: Simple arithmetic? (And the winner is ... ) by BrowserUk
in thread Simple arithmetic? by BrowserUk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.