in reply to Re^7: code optimization
in thread code optimization
Also, isn't 1*i a bit too easy ?
I assume(d) the microcode should go through the same sequence of steps regardless of the values of its operands, but maybe it is being optimised because changing it does reduce the differential between mult & div:
C:\test>muldiv-b 1000000000 integer multiplications: (by 12345)Took 3697 ticks C:\test>muldiv-b 1 1000000000 integer divisions:(of 12345) Took 3400 ticks C:\test>muldiv-b 1000000000 integer multiplications: (by 12345)Took 3697 ticks C:\test>muldiv-b 1 1000000000 integer divisions:(of 12345) Took 3432 ticks
But if it is being optimised away, for it to make so little difference to the result would mean the benchmark is totally crap.
Which I now believe to be the case. The opcodes involved in running the loop are just swamping the cost of the actual mult/div opcodes to the point where they are just noise. The only way to really verify my memory that they take the same number of clocks, would be to drop into assembler and I'm not interested enough to do that.
On pipelined processors such measurements are always iffy anyway because it will depend upon what else is in the pipeline, whether the processor stalls for caching; and a whole bunch of other stuff.
The bottom line is that I do not think that there is sufficient differential between mult & div to make a two mults approach viable.
And IO is going to dominate the OPs code whatever he does.
|
|---|