Re: Determining the minimum representable increment/decrement possible?

Replies are listed 'Best First'.
Re^2: Determining the minimum representable increment/decrement possible? by BrowserUk (Patriarch) on Jun 15, 2016 at 22:38 UTC
Basically, to determine what the smallest delta for a given floating-point, decompose that value's ieee representation into sign, fraction, and exponent. My perl's NV is double-precision (53bit precision): from MSB to LSB, sign bit, 11 exponent bits, and 52 fractional bits (plus an implied 1), for sign(1+fract_bits/2^52)(2exp), where (which my perl's NV is), the smallest change would be 2(-52+exp). So you should just need to know your exp for the current representation (not forgetting to subtract 1023 from the 11bit exponent number). If your NV is more precise (lucky you), just adjust it appropriately. That's a nice insight. Thankyou. If I need to go this route, that is almost certainly the way to do it. However, I'm not certain of this yet, but I think my OP question may not actually be required. The problem appears to be -- I'm still struggling with x64 assembler to try and confirm this -- that I'm encountering a lot of denormal numbers; and if I can avoid them, the underlying problem that my OP was an attempt to solve, goes away. And the problem with avoiding denormal numbers is that the purpose of the code that is generating them is a Newton-Raphson iteration to converge on zero. The exact place where denormals raise their ugly heads. The solution may be (maybe?) to add a constant (say 1 or 2) to both sides of the equations and converge to that number instead. (Not sure about that; but it might work :) I started work on a module that will expand an ieee754 double-precision float into sign(1+fract)2^exp, based on the internal representation. You might find Exploring IEEE754 floating point bit patterns. (and as you're working with 5.6 , the version at Re^2: Exploring IEEE754 floating point bit patterns.) interesting or even useful. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :) In the absence of evidence, opinion is indistinguishable from prejudice. Not understood.	[reply]
Re^3: Determining the minimum representable increment/decrement possible? by pryrt (Abbot) on Jun 15, 2016 at 23:35 UTC
Wow! Your convergence criteria are a lot more stringent than mine ever have been, if you're trying to get below a 1bit shift for a 53-bit precision number. (In anything I've tried to converge, I've never cared about less than ~1ppm.) Your offset technique to avoid the denormals sounds promising. And thanks for the links; I'll keep them bookmarked when I get a change to work on my module more.	[reply]
Re^4: Determining the minimum representable increment/decrement possible? by BrowserUk (Patriarch) on Jun 16, 2016 at 00:34 UTC
Wow! Your convergence criteria are a lot more stringent than mine ever have been, if you're trying to get below a 1bit shift for a 53-bit precision number. They aren't really my criteria; I'm trying to match the results produced by the code I am optimising, prior to my optimisations. I've got it to the point where over 25% of the runtime is now down to exactly 2 opcodes, both 64-bit double divides in deeply nested loops. Luckily, the divisor in both cases is invariant in the inner loop, so going the classical route of replacing the division by multiplication by the reciprocal of the divisor reduces the cost of both of those opcodes by just under 95%. (For my test case, from 1.17s to 0.06s.) The problem is that some 20% (~90,000 of 450,000) of the resultant values differ from the unoptimised code. The maximum individual difference is less than 2e-16; but cumulatively they could have an impact on the numbers derived from the dataset. And my simple testcase is 450,000 datapoints; the real ones that got me started on this quest have upwards of 10 million datapoints. But that is only half of the story. I'm just discovering that a big part of the reason for the dramatic effect, in this case, of replacing `x /= n;` with `x = 1/n` (where 1/n has been pre-calculated outside of the inner loop) is that it avoids the generation of denormal numbers. And denormal numbers are known to have a huge performance affect on FPUs. I've only learnt this in the last 10 hrs or so. My OP was seeking to adjust the value produced by 1/n, so as to minimise the accumulated difference in the values produced by the pre- & post-optimised code. But, from a few samples of the 90,000 I've inspected so far, the differences (between x /= n and x = 1/n) only rise above 0.5 ulp when x is a denormal value. I've never had to deal with denormals before; so this is all new ground for me; and I'm finding it particularly taxing; requiring lots of research in areas of math that go way over my head. At least the papers do. In many cases -- as so often -- when you cut through the jargon, notation and 'math-paper-ese' that seems to be obligatory, the actual code required -- which is never included -- is relatively simple. Eg. Pick a pdf; and then realise that it all comes down to something as simple as: `sub NRreciprocal { my( $n, $g ) = @_; my $x0 = $g; while( 1 ) { my $x1 = $x0 + $x0 * ( 1 - $n * $x0 ); last if ( $x0 - $x1 ) == 0.0; $x0 = $x1; } return $x0; }` [download] Why, oh why, do they insist on making everything so damn complicated? With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :) In the absence of evidence, opinion is indistinguishable from prejudice. Not understood.	[reply] [d/l] [select]
Re^3: Determining the minimum representable increment/decrement possible? by pryrt (Abbot) on Jun 30, 2016 at 23:50 UTC
I'm sure you've moved beyond this... but for future searchers: While trying to implement some ULP-manipulating routines in my Data::IEEE754::Tools development (which I used to call Expand.pm), I was googling to fix a bug, and was led to Data::Float, which actually has most of the tools I was trying to implement in my module. Using that CPAN module, I was able to write a sub which found the ULP for any finite value (normal or denormal); it returns INF or NAN when one of those was passed. use warnings; use strict; use feature qw/say/; use Data::Float qw/pow2 float_parts float_hex significand_bits min_normal min_normal_exp min_finite min_finite_exp max_finite pos_zero neg_zero pos_infinity neg_infinity nan float_is_zero float_is_finite/; sub find_ulp($) { my $nv = shift; my $vh = float_hex($nv); if(!float_is_finite($nv)) { # INF or NAN say "ulp($nv:$vh) = $nv:$vh"; return $nv; } my $e = float_is_zero($nv) ? (min_finite_exp) : ( (float_parts($n +v))[1] - significand_bits ); my $uv = pow2($e); my $uh = float_hex($uv); say "ulp($nv:$vh) = $uv:$uh"; return $uv; } find_ulp($_) foreach (2.0, 1.0, 0.5, 0.16, max_finite, min_normal, min +_normal/2., min_finite, pos_zero, neg_zero , pos_infinity , neg_infin +ity , nan); __END__ ulp(2:+0x1.0000000000000p+1) = 4.44089209850063e-016:+0x1.000000000000 +0p-51 ulp(1:+0x1.0000000000000p+0) = 2.22044604925031e-016:+0x1.000000000000 +0p-52 ulp(0.5:+0x1.0000000000000p-1) = 1.11022302462516e-016:+0x1.0000000000 +000p-53 ulp(0.16:+0x1.47ae147ae147bp-3) = 2.77555756156289e-017:+0x1.000000000 +0000p-55 ulp(1.79769313486232e+308:+0x1.fffffffffffffp+1023) = 1.99584030953472 +e+292:+0x1.0000000000000p+971 ulp(2.2250738585072e-308:+0x1.0000000000000p-1022) = 4.94065645841247e +-324:+0x0.0000000000001p-1022 ulp(1.1125369292536e-308:+0x0.8000000000000p-1022) = 4.94065645841247e +-324:+0x0.0000000000001p-1022 ulp(4.94065645841247e-324:+0x0.0000000000001p-1022) = 4.94065645841247 +e-324:+0x0.0000000000001p-1022 ulp(0:+0.0) = 4.94065645841247e-324:+0x0.0000000000001p-1022 ulp(0:-0.0) = 4.94065645841247e-324:+0x0.0000000000001p-1022 ulp(Inf:+inf) = Inf:+inf ulp(-Inf:-inf) = -Inf:-inf ulp(NaN:nan) = NaN:nan [download]	[reply] [d/l]
Re^4: Determining the minimum representable increment/decrement possible? by syphilis (Archbishop) on Jul 01, 2016 at 01:09 UTC
Yes, Data::Float (which had slipped my mind at the time) is worth a mention here. It also provides nextup, nextdown and nextafter functions. Be aware that the outputs you're getting provide insufficient precision - eg the approximation 4.44089209850063e-016 instead of the accurate 4.4408920985006262e-16. Cheers, Rob	[reply]
Re^4: Determining the minimum representable increment/decrement possible? by BrowserUk (Patriarch) on Jul 04, 2016 at 14:21 UTC
I was googling to fix a bug, and was led to Data::Float, which actually has most of the tools I was trying to implement in my module. You should not let that stop you. That module gets (mostly) the right answers, but boy does it ever go about it in a convoluted way; which makes it very slow even for pure perl code. For example, this compares his method of nextup() and nextdown() with a couple of trivial routines I threw together when exploring this earlier: `C:\test>FPstuff.pl Took:9.723010063 seconds ## his Took:0.949742079 seconds ## mine` [download] And this compares his float_hex() routine with my asHex(): `C:\test>FPstuff.pl "my" variable $start masks earlier declaration in same scope at C:\tes +t\FPstuff.pl line 108. Took:9.740586996 seconds ## his Took:0.699841022 seconds ## mine` [download] His are more than a magnitude slower in both cases -- mostly because he uses convoluted and complicated numerical methods rather than simple and pragmatic bitwise methods -- and certainly for my OP problem of trying to tune some floating point optimisations, that would be a show stopper. Here is the code that produce the above timings -- is just a mish-mash of bits and pieces from other scripts thrown together to test and validate that module. If any of it is useful to you; feel free to use it: #! perl -slw use Config; use Inline C => Config => BUILD_NOISY => 1; use Inline C => <<'END_C', NAME => 'nextafter', CLEAN_AFTER_BUILD =>0 +; #include <math.h> double cNextAfter( double x, double y ) { return _nextafter( x, y ); } SV cAsHex( double d ) { SV ret = newSVpv( "123456789012345678901234", 24 ); SvCUR( ret ) = sprintf( SvPVX(ret), "% -24.13a", d ); return ret; } END_C use constant DEBUG => $DEBUG; use strict; no warnings "portable"; use Data::Float qw[ nextafter nextup nextdown float_hex ]; use Time::HiRes qw[ time ]; use constant { SIGNBIT => 0x8000_0000_0000_0000, EXPBITS => 0x7FF0_0000_0000_0000, SIGBITS => 0xF_FFFF_FFFF_FFFF, INDBITS => 0x8_0000_0000_0000, }; use constant { POS_QNAN_LST => 0b0_11111111111_1111_11111111_11111111_11111111 +_11111111_11111111_11111111, POS_QNAN_1ST => 0b0_11111111111_1000_00000000_00000000_00000000 +_00000000_00000000_00000000, POS_SNAN_LST => 0b0_11111111111_0111_11111111_11111111_11111111 +_11111111_11111111_11111111, POS_SNAN_1ST => 0b0_11111111111_0000_00000000_00000000_00000000 +_00000000_00000000_00000001, POS_INF => 0b0_11111111111_0000_00000000_00000000_00000000 +_00000000_00000000_00000000, POS_NORM_LST => 0b0_11111111110_1111_11111111_11111111_11111111 +_11111111_11111111_11111111, POS_NORM_1ST => 0b0_00000000001_0000_00000000_00000000_00000000 +_00000000_00000000_00000000, POS_DENORM_LST => 0b0_00000000000_1111_11111111_11111111_11111111 +_11111111_11111111_11111111, POS_DENORM_1ST => 0b0_00000000000_0000_00000000_00000000_00000000 +_00000000_00000000_00000001, POS_ZERO => 0b0_00000000000_0000_00000000_00000000_00000000 +_00000000_00000000_00000000, NEG_ZERO => 0b1_00000000000_0000_00000000_00000000_00000000 +_00000000_00000000_00000000, NEG_DENORM_1ST => 0b1_00000000000_0000_00000000_00000000_00000000 +_00000000_00000000_00000001, NEG_DENORM_LST => 0b1_00000000000_1111_11111111_11111111_11111111 +_11111111_11111111_11111111, NEG_NORM_1ST => 0b1_00000000001_0000_00000000_00000000_00000000 +_00000000_00000000_00000000, NEG_NORM_LST => 0b1_11111111110_1111_11111111_11111111_11111111 +_11111111_11111111_11111111, NEG_INF => 0b1_11111111111_0000_00000000_00000000_00000000 +_00000000_00000000_00000000, NEG_SNAN_1ST => 0b1_11111111111_0000_00000000_00000000_00000000 +_00000000_00000000_00000001, NEG_SNAN_LST => 0b1_11111111111_0111_11111111_11111111_11111111 +_11111111_11111111_11111111, NEG_IND => 0b1_11111111111_1000_00000000_00000000_00000000 +_00000000_00000000_00000000, NEG_QNAN_1ST => 0b1_11111111111_1000_00000000_00000000_00000000 +_00000000_00000000_00000001, NEG_QNAN_LST => 0b1_11111111111_1111_11111111_11111111_11111111 +_11111111_11111111_11111111, }; use enum qw[ X Y ]; sub assert{ return unless DEBUG; die sprintf "%s - %s(%d): Assertion failed.\n", caller() unless $_ +[0]; } sub asBits{ join '_', unpack 'A1 A11 A4 (A8)6', unpack 'B64', scalar r +everse pack 'd', $_[X] } sub asHex{ my $bin = unpack 'Q', pack 'd', $_[X]; return "-0x1.#IND000000000p+0" if $bin == NEG_IND; my $sign = ( $bin & SIGNBIT ) ? '-' : ' '; my $exp = ( ( $bin & EXPBITS ) >> 52 ); my $mant = $bin & SIGBITS; if( $exp == 2047 ) { return $sign . "0x1.#INF000000000p+0" unless $mant; return $sign . ( $mant < INDBITS ? "0x1.#SNAN00000000p+0" : "0 +x1.#QNAN00000000p+0" ); } $exp -= 1023; my $hid = $exp == -1023 ? ( ++$exp, 0 ) : 1; sprintf "%s0x%1u.%013xp%+d", $sign, $hid, $mant, $exp; } sub asDouble{ unpack 'd', pack 'Q', $_[X] } sub nextUp{ return $_[X] if $_[X] != $_[X]; my $i = unpack 'Q', pack 'd', $_[X]; return asDouble( NEG_NORM_LST ) if $i == NEG_INF; return $_[X] if ( $i & POS_INF ) == POS_INF; return asDouble( POS_DENORM_1ST ) if $i == NEG_ZERO; return asDouble( $_[X] < 0 ? --$i : ++$i ); } sub nextDown { - nextUp( - $_[X] ) } sub nextAfter{ return $_[Y] if $_[X] == $_[Y]; return $_[X] < $_[Y] ? nextUp( $_[X] ) : nextDown( $_[X] ); } srand( 1 ); my $start = time; for( 1 .. 100000 ) { my $rv = unpack 'd', pack 'VV', int( rand 65536 ), int( rand 65536 + ); my $hex = float_hex( $rv ); } printf "Took:%.9f seconds\n", time() - $start; srand( 1 ); my $start = time; for( 1 .. 100000 ) { my $rv = unpack 'd', pack 'VV', int( rand 65536 ), int( rand 65536 + ); my $hex = asHex( $rv ); } printf "Took:%.9f seconds\n", time() - $start; __END__ srand( 1 ); my $start = time; for( 1 .. 100000 ) { my $rv = unpack 'd', pack 'VV', int( rand 65536 ), int( rand 65536 + ); my $up = nextup( $rv ); my $dn = nextdown( $rv ); } printf "Took:%.9f seconds\n", time() - $start; srand( 1 ); my $start = time; for( 1 .. 100000 ) { my $rv = unpack 'd', pack 'VV', int( rand 65536 ), int( rand 65536 + ); my $up = nextUp( $rv ); my $dn = nextDown( $rv ); } printf "Took:%.9f seconds\n", time() - $start; __END__ for my $first ( 2e-300, 2e-200, 2e-100, 2e-10, 2e-1, 2e0, 2e1, 2e10, 2e100, 2e +200, 2e300, map asDouble( $_ ), NEG_SNAN_1ST, NEG_SNAN_LST, NEG_IND, NEG_QNAN_1ST, NEG_QNAN_LST, NEG_INF, NEG_NORM_LST, NEG_NORM_1ST, NEG_DENORM_LST, NEG_DENORM_1S +T, NEG_ZERO, POS_ZERO, POS_DENORM_1ST, POS_DENORM_LST, POS_NORM_1ST, POS_NORM_LST, POS_IN +F, POS_SNAN_1ST, POS_SNAN_LST, POS_QNAN_1ST, POS_QNAN_LST, ) { printf "%32.18g\t%-24s\t%-24s\n", $first, asHex( nextAfter( $first, asDouble( POS_INF ) ) ), asHex( nextafter( $first, asDouble( POS_INF ) ) ); printf "%32.18g\t%-24s\t%-24s\n", $first, asHex( nextAfter( $first, asDouble( NEG_INF ) ) ), asHex( nextafter( $first, asDouble( NEG_INF ) ) ); } __END__ srand( 1 ); my $start = time; for( 1 .. 100000 ) { my $rv = unpack 'd', pack 'VV', int( rand 65536 ), int( rand 65536 + ); my $hex = float_hex( $rv ); } printf "Took:%.9f seconds\n", time() - $start; srand( 1 ); my $start = time; for( 1 .. 100000 ) { my $rv = unpack 'd', pack 'VV', int( rand 65536 ), int( rand 65536 + ); my $hex = asHex( $rv ); } printf "Took:%.9f seconds\n", time() - $start; __END__ for my $first ( map asDouble( $_ ), NEG_INF, NEG_NORM_LST, NEG_NORM_1ST, NEG_DENORM_LST, NEG_DENORM_1S +T, NEG_ZERO, POS_ZERO, POS_DENORM_1ST, POS_DENORM_LST, POS_NORM_1ST, POS_NORM_L +ST, POS_INF, ) { printf "nextDown(%s) % 32.17e\n (%s) % 32.17e\n nextUp(%s) + % 32.17e\n\n", asBits( nextDown( $first ) ), nextDown( $first ), asBits( $first ), $first, asBits( nextUp( $first ) ), nextUp( $first ); } for my $first ( 2e-300, 2e-200, 2e-100, 2e-10, 2e-1, 2e0, 2e1, 2e10, 2e100, 2e +200, 2e300, map asDouble( $_ ), NEG_SNAN_1ST, NEG_SNAN_LST, NEG_IND, NEG_QNAN_1ST, NEG_QNAN_LST, NEG_INF, NEG_NORM_LST, NEG_NORM_1ST, NEG_DENORM_LST, NEG_DENORM_1S +T, NEG_ZERO, POS_ZERO, POS_DENORM_1ST, POS_DENORM_LST, POS_NORM_1ST, POS_NORM_LST, POS_IN +F, POS_SNAN_1ST, POS_SNAN_LST, POS_QNAN_1ST, POS_QNAN_LST, ) { printf "%s\t%32.18g\t%s\t%s\n", asBits( $first ), $first, cAsHex( +$first ), asHex( $first ); } [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :) In the absence of evidence, opinion is indistinguishable from prejudice. Not understood. .	[reply] [d/l] [select]
Re^5: Determining the minimum representable increment/decrement possible? by BrowserUk (Patriarch) on Jul 04, 2016 at 15:01 UTC