Re^6: code optimization

I did a similar thing using C:

#include <stdio.h>

#define ITERS 1000000000ul

int main( int argc, char **argv ) {
    __int64 start;
    int i;
    double d;

    getch();

    start = GetTickCount64();

    if( argc > 1 ) {
        printf( "%u integer divisions: ", ITERS );
        start = GetTickCount64();

         for( i = 1; i < ITERS; i++ )
             d = 1 / i;

        printf( "Took %I64d ticks\n", GetTickCount64() - start );
    }
    else {
        printf( "%u integer multiplications: ", ITERS );
        start = GetTickCount64();

         for( i = 1; i < ITERS; i++ )
             d = 1 * i;

        printf( "Took %I64d ticks\n", GetTickCount64() - start );
    }

}
[download]

And on my 64-bit processor, for 32-bit ints I got:

C:\test>muldiv-b 1
1000000000 integer divisions: Took 3432 ticks

C:\test>muldiv-b
1000000000 integer multiplications: Took 2917 ticks
[download]

The numbers vary ~+-30 ticks for individual runs, but division is always ~10% slower than multiplication. I put this down the subsequent promotion of the result to a double rather than the opcode itself.

Conversely, if I use 64-bit ints division is almost 7X slower than multiplication:

C:\test>muldiv-b
1000000000 integer multiplications: Took 3011 ticks

C:\test>muldiv-b 1
1000000000 integer divisions: Took 20764 ticks
[download]

This is think is due to the fact that two 64-bit registers are involved in the result.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

Comment on Re^6: code optimization Select or Download Code

Replies are listed 'Best First'.
Re^7: code optimization by spx2 (Deacon) on Nov 04, 2011 at 15:27 UTC
Yeah, I guess my results show division is a bit slower(on most runs, just on one of 'em it was actually faster) Maybe it's possible to tell the compiler to not jump to optimizations. Also, isn't 1*i a bit too easy ?	[reply]
Re^8: code optimization by BrowserUk (Patriarch) on Nov 04, 2011 at 16:10 UTC
Also, isn't 1i a bit too easy ?* I assume(d) the microcode should go through the same sequence of steps regardless of the values of its operands, but maybe it is being optimised because changing it does reduce the differential between mult & div: `C:\test>muldiv-b 1000000000 integer multiplications: (by 12345)Took 3697 ticks C:\test>muldiv-b 1 1000000000 integer divisions:(of 12345) Took 3400 ticks C:\test>muldiv-b 1000000000 integer multiplications: (by 12345)Took 3697 ticks C:\test>muldiv-b 1 1000000000 integer divisions:(of 12345) Took 3432 ticks` [download] But if it is being optimised away, for it to make so little difference to the result would mean the benchmark is totally crap. Which I now believe to be the case. The opcodes involved in running the loop are just swamping the cost of the actual mult/div opcodes to the point where they are just noise. The only way to really verify my memory that they take the same number of clocks, would be to drop into assembler and I'm not interested enough to do that. On pipelined processors such measurements are always iffy anyway because it will depend upon what else is in the pipeline, whether the processor stalls for caching; and a whole bunch of other stuff. The bottom line is that I do not think that there is sufficient differential between mult & div to make a two mults approach viable. And IO is going to dominate the OPs code whatever he does. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]