load reg a, num1; move 8 bytes from memory (probably cache) ; maybe 2 or 4 clock cycles barring stalls which given the ;C array is probably contigious memory probably happens every 128K values ;after the array is initially addressed depending on the size of the L1/L2 caches ;and what else the surrounding code is doing load reg b, num2; ditto fmul reg a, regb; A Floating Point processor instruction ; Depending on the processor could be 1 to 10 or maybe 20 clock cycles store reg a, num1; 8 bytes stored. ; Another 2/4 clock cycles.