in reply to Re^3: subtraction issue
in thread subtraction issue

Feel free to ignore this post. I just couldn't help myself (I needed the coffee-break distraction, apparently)

Interestingly (to me), as detailed in my post in another thread, it takes n fixed-point decimal digits to exactly represent an n-bit fixed-point fractional binary number: ie, given an IEEE-754 "double", which expresses to a precision of 52 digits after the binary-point: to represent 2**-52, it takes 52 digits after the decimal point to represent it exactly in decimal. However, if you just want "good enough" to be able to distinguish between the smallest change (ULP: the smallest binary digit in the floating-point representation; the value of that bit is the smallest change you can see in a decimal number; see Re: Determining the minimum representable increment/decrement possible?) -- for that, you only need enough digits to look at the leading digit of the decimal expansion of 2**-52. To find the location (d) of the leading digit of the value of some fractional number (f), use $d = POSIX::ceil(-POSIX::log10($f)).

If you are using a scientific-notation format (%e), then you only need a precision of 16 digits (%.16e) to uniquely represent each double floating-point value. If you are using a fixed-point notation (%f), then you will need fewer than 16 digits of precision (%.16f) if your abs(number) is greater than 4, but more than 16 digits of precision if your abs(number) is less than 0.5. The code below shows some examples (shamless plug: Data::IEEE754::Tools was written to work even with older perls, where %a wouldn't help me. The to_xxx_floatingpoint() functions show you exact hex and sufficient decimal floating-point representations using p-### to show the power of two, instead of e-### to show the power of 10 -- it makes it closer to the internal IEEE-754 representation).

use warnings; use strict; use Data::IEEE754::Tools qw(to_hex_floatingpoint to_dec_floatingpoint +nextUp nextDown); use 5.022; # for %a printf "%-6.6s %30.30s %30.30s %30.30s %30.30s %30.30s\n", 'approx +', '%30.16f', '%30.16e', '%a', 'to_hex_floatingpoint()', 'to_dec_floa +tingpoint()'; foreach my $v ( map { $_ / 10. } 0 .. 100, 1280, 10240) { # 0 .. 1 +0, 128, 1024 my $d = nextDown($v); my $u = nextUp($v); printf "%-6.1f-ULP %30.16f %30.16e %30.13a %30.30s %30.30s\n", $v, + $d, $d, $d, to_hex_floatingpoint($d), to_dec_floatingpoint($d); printf "%-6.1f %30.16f %30.16e %30.13a %30.30s %30.30s\n", $v, + $v, $v, $v, to_hex_floatingpoint($v), to_dec_floatingpoint($v); printf "%-6.1f+ULP %30.16f %30.16e %30.13a %30.30s %30.30s\n", $v, + $u, $u, $u, to_hex_floatingpoint($u), to_dec_floatingpoint($u); print "\n"; } printf "%-6.6s %30.30s %30.30s %30.30s %30.30s %30.30s\n", 'approx +', '%30.16f', '%30.16e', '%a', 'to_hex_floatingpoint()', 'to_dec_floa +tingpoint()';

snippets of output:

approx %30.16f %30.1 +6e %a to_hex_floatingpoint() + to_dec_floatingpoint() ... 0.4 -ULP 0.4000000000000000 3.9999999999999997e-0 +01 0x1.9999999999999p-2 +0x1.9999999999999p-0002 + +0d1.5999999999999999p-0002 0.4 0.4000000000000000 4.0000000000000002e-0 +01 0x1.999999999999ap-2 +0x1.999999999999ap-0002 + +0d1.6000000000000001p-0002 0.4 +ULP 0.4000000000000001 4.0000000000000008e-0 +01 0x1.999999999999bp-2 +0x1.999999999999bp-0002 + +0d1.6000000000000003p-0002

With .16f, cannot tell the difference between 0.4-ULP and 0.4. However, in full scientific notation (.16e), you can.

0.5 -ULP 0.4999999999999999 4.9999999999999994e-0 +01 0x1.fffffffffffffp-2 +0x1.fffffffffffffp-0002 + +0d1.9999999999999998p-0002 0.5 0.5000000000000000 5.0000000000000000e-0 +01 0x1.0000000000000p-1 +0x1.0000000000000p-0001 + +0d1.0000000000000000p-0001 0.5 +ULP 0.5000000000000001 5.0000000000000011e-0 +01 0x1.0000000000001p-1 +0x1.0000000000001p-0001 + +0d1.0000000000000002p-0001

Format %.16f is sufficient for resolving differences between 0.5-ULP and 0.5 and 0.5+ULP.

... 4.0 -ULP 3.9999999999999996 3.9999999999999996e+0 +00 0x1.fffffffffffffp+1 +0x1.fffffffffffffp+0001 + +0d1.9999999999999998p+0001 4.0 4.0000000000000000 4.0000000000000000e+0 +00 0x1.0000000000000p+2 +0x1.0000000000000p+0002 + +0d1.0000000000000000p+0002 4.0 +ULP 4.0000000000000009 4.0000000000000009e+0 +00 0x1.0000000000001p+2 +0x1.0000000000001p+0002 + +0d1.0000000000000002p+0002 4.1 -ULP 4.0999999999999988 4.0999999999999988e+0 +00 0x1.0666666666665p+2 +0x1.0666666666665p+0002 + +0d1.0249999999999997p+0002 4.1 4.0999999999999996 4.0999999999999996e+0 +00 0x1.0666666666666p+2 +0x1.0666666666666p+0002 + +0d1.0249999999999999p+0002 4.1 +ULP 4.1000000000000005 4.1000000000000005e+0 +00 0x1.0666666666667p+2 +0x1.0666666666667p+0002 + +0d1.0250000000000001p+0002

At 4.0,rounding would show "4.000000000000000", "4.000000000000000", and "4.000000000000001" for %.15f, so you still need 16 digits of precision. But at 4.1, it would show "4.099999999999999", "4.100000000000000", and "4.100000000000001", so %.15f would be sufficient.

... 10.0 -ULP 9.9999999999999982 9.9999999999999982e+0 +00 0x1.3ffffffffffffp+3 +0x1.3ffffffffffffp+0003 + +0d1.2499999999999998p+0003 10.0 10.0000000000000000 1.0000000000000000e+0 +01 0x1.4000000000000p+3 +0x1.4000000000000p+0003 + +0d1.2500000000000000p+0003 10.0 +ULP 10.0000000000000020 1.0000000000000002e+0 +01 0x1.4000000000001p+3 +0x1.4000000000001p+0003 + +0d1.2500000000000002p+0003 128.0 -ULP 127.9999999999999900 1.2799999999999999e+0 +02 0x1.fffffffffffffp+6 +0x1.fffffffffffffp+0006 + +0d1.9999999999999998p+0006 128.0 128.0000000000000000 1.2800000000000000e+0 +02 0x1.0000000000000p+7 +0x1.0000000000000p+0007 + +0d1.0000000000000000p+0007 128.0 +ULP 128.0000000000000300 1.2800000000000003e+0 +02 0x1.0000000000001p+7 +0x1.0000000000001p+0007 + +0d1.0000000000000002p+0007 1024.0-ULP 1023.9999999999999000 1.0239999999999999e+0 +03 0x1.fffffffffffffp+9 +0x1.fffffffffffffp+0009 + +0d1.9999999999999998p+0009 1024.0 1024.0000000000000000 1.0240000000000000e+0 +03 0x1.0000000000000p+10 +0x1.0000000000000p+0010 + +0d1.0000000000000000p+0010 1024.0+ULP 1024.0000000000002000 1.0240000000000002e+0 +03 0x1.0000000000001p+10 +0x1.0000000000001p+0010 + +0d1.0000000000000002p+0010 approx %30.16f %30.1 +6e %a to_hex_floatingpoint() + to_dec_floatingpoint()

Replies are listed 'Best First'.
Re^5: subtraction issue
by syphilis (Archbishop) on Jan 28, 2017 at 01:47 UTC
    it takes n fixed-point decimal digits to exactly represent an n-bit fixed-point fractional binary number

    Thank you for correcting me ... but shouldn't that be "n+1 fixed-point decimal digits" ?
    use strict; use warnings; use Math::MPFR qw(:mpfr); Rmpfr_set_default_prec(1000); my $d = 1 / 11; my $x = Math::MPFR->new($d); print "$x\n", length($x), "\n"; # (print() removes trailing zeros.) # Outputs: 9.09090909090909116141432377844466827809810638427734375e-2 58
    Take out the "." and the "e-2" and we're left with 54 decimal digits.

    Cheers,
    Rob

      That's confusing to me, because the math says that for a fraction that's exactly representable with d digits after the binary point, f = m * 2**-d = m * 5**d / (5**d * 2**d) = m * 5**d / 10**d, which is exactly representable with d digits after the decimal point, because m * 5**d is an integer, and an integer over 10**d only needs d digits after the decimal point: 17 / 10**5 = 0.00017.

      I will try to find some time this weekend to see what's going on. 1/11 isn't exactly representable in binary, but assuming that mpfr is using the exact value of the double precision, I would think it would fit in the 52 digits after the point. Oh, wait! The 52 is actually the digits of the fraction of the mantissa after removing enough powers of 2 to get the mantissa between 1 and 2. So internally, it would be 1.4545... * 2**-4. So we've got 52 binary digits after the double floating point, plus four more digits due the power of two, which makes d = 58: the 54 SIG figs you show, plus the 2 zeroes from e-2, plus apparently 2 zeroes after the final 5 shown. (This would have been so much easier if I were at a computer with perl, not just my tablet and my emulator in my skull...)

        ... assuming that mpfr is using the exact value of the double precision ...

        It is - unless there's a bug at work.

        which makes d = 58

        Here's a couple of values that, according to mpfr, require 60 decimal digits:
        1 / 526 1.90114068441064637711435114653113487293012440204620361328125e-3 1 / 535 1.86915887850467297461032334382480257772840559482574462890625e-3
        Might be time I stopped fiddling and started thinking. (Have you got a spare one of those in-skull emulators that you could send over ?)

        Update: Fiddling is much easier, especially on a Sat'dy night. I've found numerous doubles that apparently require 122 decimal digits for exact representation, but haven't found any that require more than 122. For example:
        25 / (8360484 ** 15) 3.66833261843476555514935702258199343058161821865425601769198458144384 +41059633175331427349000780680559003454762069283595612e-103
        Cheers,
        Rob