Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re^5: code optimization

by spx2 (Deacon)
on Nov 04, 2011 at 14:08 UTC ( [id://935942]=note: print w/replies, xml ) Need Help??


in reply to Re^4: code optimization
in thread code optimization

I tried the following test for division.
perl -MTime::HiRes=gettimeofday,tv_interval -e '$istart=1000;$iend=500 +0; $t0=[gettimeofday]; for $x($istart..$iend){for $y($istart..$iend) +{ $x/$y }}; $t1=[gettimeofday]; print "lasted->".tv_interval($t0,$t1) +."\n"'
I did the same thing for multiplication. On a couple of runs for division I got the times:
  • 1.032998
  • 1.029043
  • 1.043523
  • 1.059354
  • 1.034561
  • 1.072301
  • 1.10864
  • 1.034843
On a couple of runs for multiplication I got the times:
  • 1.075402
  • 1.093403
  • 1.089273
  • 1.077661
  • 1.074203
  • 1.091646
  • 1.080421

These numbers are seconds.

I couldn't draw any conclusions from this...

Replies are listed 'Best First'.
Re^6: code optimization
by BrowserUk (Patriarch) on Nov 04, 2011 at 14:31 UTC

    I did a similar thing using C:

    #include <stdio.h> #define ITERS 1000000000ul int main( int argc, char **argv ) { __int64 start; int i; double d; getch(); start = GetTickCount64(); if( argc > 1 ) { printf( "%u integer divisions: ", ITERS ); start = GetTickCount64(); for( i = 1; i < ITERS; i++ ) d = 1 / i; printf( "Took %I64d ticks\n", GetTickCount64() - start ); } else { printf( "%u integer multiplications: ", ITERS ); start = GetTickCount64(); for( i = 1; i < ITERS; i++ ) d = 1 * i; printf( "Took %I64d ticks\n", GetTickCount64() - start ); } }

    And on my 64-bit processor, for 32-bit ints I got:

    C:\test>muldiv-b 1 1000000000 integer divisions: Took 3432 ticks C:\test>muldiv-b 1000000000 integer multiplications: Took 2917 ticks

    The numbers vary ~+-30 ticks for individual runs, but division is always ~10% slower than multiplication. I put this down the subsequent promotion of the result to a double rather than the opcode itself.

    Conversely, if I use 64-bit ints division is almost 7X slower than multiplication:

    C:\test>muldiv-b 1000000000 integer multiplications: Took 3011 ticks C:\test>muldiv-b 1 1000000000 integer divisions: Took 20764 ticks

    This is think is due to the fact that two 64-bit registers are involved in the result.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Yeah, I guess my results show division is a bit slower(on most runs, just on one of 'em it was actually faster)

      Maybe it's possible to tell the compiler to not jump to optimizations.

      Also, isn't 1*i a bit too easy ?

        Also, isn't 1*i a bit too easy ?

        I assume(d) the microcode should go through the same sequence of steps regardless of the values of its operands, but maybe it is being optimised because changing it does reduce the differential between mult & div:

        C:\test>muldiv-b 1000000000 integer multiplications: (by 12345)Took 3697 ticks C:\test>muldiv-b 1 1000000000 integer divisions:(of 12345) Took 3400 ticks C:\test>muldiv-b 1000000000 integer multiplications: (by 12345)Took 3697 ticks C:\test>muldiv-b 1 1000000000 integer divisions:(of 12345) Took 3432 ticks

        But if it is being optimised away, for it to make so little difference to the result would mean the benchmark is totally crap.

        Which I now believe to be the case. The opcodes involved in running the loop are just swamping the cost of the actual mult/div opcodes to the point where they are just noise. The only way to really verify my memory that they take the same number of clocks, would be to drop into assembler and I'm not interested enough to do that.

        On pipelined processors such measurements are always iffy anyway because it will depend upon what else is in the pipeline, whether the processor stalls for caching; and a whole bunch of other stuff.

        The bottom line is that I do not think that there is sufficient differential between mult & div to make a two mults approach viable.

        And IO is going to dominate the OPs code whatever he does.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://935942]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2024-04-25 05:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found