in reply to Re^2: Determining the minimum representable increment/decrement possible?
in thread Determining the minimum representable increment/decrement possible?

Wow! Your convergence criteria are a lot more stringent than mine ever have been, if you're trying to get below a 1bit shift for a 53-bit precision number. (In anything I've tried to converge, I've never cared about less than ~1ppm.)

Your offset technique to avoid the denormals sounds promising.

And thanks for the links; I'll keep them bookmarked when I get a change to work on my module more.

  • Comment on Re^3: Determining the minimum representable increment/decrement possible?

Replies are listed 'Best First'.
Re^4: Determining the minimum representable increment/decrement possible?
by BrowserUk (Patriarch) on Jun 16, 2016 at 00:34 UTC
    Wow! Your convergence criteria are a lot more stringent than mine ever have been, if you're trying to get below a 1bit shift for a 53-bit precision number.

    They aren't really my criteria; I'm trying to match the results produced by the code I am optimising, prior to my optimisations.

    I've got it to the point where over 25% of the runtime is now down to exactly 2 opcodes, both 64-bit double divides in deeply nested loops.

    Luckily, the divisor in both cases is invariant in the inner loop, so going the classical route of replacing the division by multiplication by the reciprocal of the divisor reduces the cost of both of those opcodes by just under 95%. (For my test case, from 1.17s to 0.06s.)

    The problem is that some 20% (~90,000 of 450,000) of the resultant values differ from the unoptimised code. The maximum individual difference is less than 2e-16; but cumulatively they could have an impact on the numbers derived from the dataset. And my simple testcase is 450,000 datapoints; the real ones that got me started on this quest have upwards of 10 million datapoints. But that is only half of the story.

    I'm just discovering that a big part of the reason for the dramatic effect, in this case, of replacing x /= n; with x *= 1/n (where 1/n has been pre-calculated outside of the inner loop) is that it avoids the generation of denormal numbers. And denormal numbers are known to have a huge performance affect on FPUs.

    I've only learnt this in the last 10 hrs or so. My OP was seeking to adjust the value produced by 1/n, so as to minimise the accumulated difference in the values produced by the pre- & post-optimised code. But, from a few samples of the 90,000 I've inspected so far, the differences (between x /= n and x *= 1/n) only rise above 0.5 ulp when x is a denormal value. I've never had to deal with denormals before; so this is all new ground for me; and I'm finding it particularly taxing; requiring lots of research in areas of math that go way over my head.

    At least the papers do. In many cases -- as so often -- when you cut through the jargon, notation and 'math-paper-ese' that seems to be obligatory, the actual code required -- which is *never* included -- is relatively simple. Eg. Pick a pdf; and then realise that it all comes down to something as simple as:

    sub NRreciprocal { my( $n, $g ) = @_; my $x0 = $g; while( 1 ) { my $x1 = $x0 + $x0 * ( 1 - $n * $x0 ); last if ( $x0 - $x1 ) == 0.0; $x0 = $x1; } return $x0; }

    Why, oh why, do they insist on making everything so damn complicated?


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
    In the absence of evidence, opinion is indistinguishable from prejudice. Not understood.