Re^4: subtraction issue

Feel free to ignore this post. I just couldn't help myself (I needed the coffee-break distraction, apparently)

Interestingly (to me), as detailed in my post in another thread, it takes n fixed-point decimal digits to exactly represent an n-bit fixed-point fractional binary number: ie, given an IEEE-754 "double", which expresses to a precision of 52 digits after the binary-point: to represent 2**-52, it takes 52 digits after the decimal point to represent it exactly in decimal. However, if you just want "good enough" to be able to distinguish between the smallest change (ULP: the smallest binary digit in the floating-point representation; the value of that bit is the smallest change you can see in a decimal number; see Re: Determining the minimum representable increment/decrement possible?) -- for that, you only need enough digits to look at the leading digit of the decimal expansion of 2**-52. To find the location (d) of the leading digit of the value of some fractional number (f), use $d = POSIX::ceil(-POSIX::log10($f)).

If you are using a scientific-notation format (%e), then you only need a precision of 16 digits (%.16e) to uniquely represent each double floating-point value. If you are using a fixed-point notation (%f), then you will need fewer than 16 digits of precision (%.16f) if your abs(number) is greater than 4, but more than 16 digits of precision if your abs(number) is less than 0.5. The code below shows some examples (shamless plug: Data::IEEE754::Tools was written to work even with older perls, where %a wouldn't help me. The to_xxx_floatingpoint() functions show you exact hex and sufficient decimal floating-point representations using p-### to show the power of two, instead of e-### to show the power of 10 -- it makes it closer to the internal IEEE-754 representation).

use warnings;
use strict;
use Data::IEEE754::Tools qw(to_hex_floatingpoint to_dec_floatingpoint 
+nextUp nextDown);
use 5.022;  # for %a

printf "%-6.6s     %30.30s %30.30s %30.30s %30.30s %30.30s\n", 'approx
+', '%30.16f', '%30.16e', '%a', 'to_hex_floatingpoint()', 'to_dec_floa
+tingpoint()';
foreach my $v ( map { $_ / 10. } 0 .. 100, 1280, 10240) {     # 0 .. 1
+0, 128, 1024
    my $d = nextDown($v);
    my $u = nextUp($v);
    printf "%-6.1f-ULP %30.16f %30.16e %30.13a %30.30s %30.30s\n", $v,
+ $d, $d, $d, to_hex_floatingpoint($d), to_dec_floatingpoint($d);
    printf "%-6.1f     %30.16f %30.16e %30.13a %30.30s %30.30s\n", $v,
+ $v, $v, $v, to_hex_floatingpoint($v), to_dec_floatingpoint($v);
    printf "%-6.1f+ULP %30.16f %30.16e %30.13a %30.30s %30.30s\n", $v,
+ $u, $u, $u, to_hex_floatingpoint($u), to_dec_floatingpoint($u);
    print "\n";
}
printf "%-6.6s     %30.30s %30.30s %30.30s %30.30s %30.30s\n", 'approx
+', '%30.16f', '%30.16e', '%a', 'to_hex_floatingpoint()', 'to_dec_floa
+tingpoint()';
[download]

snippets of output:

approx                            %30.16f                        %30.1
+6e                             %a         to_hex_floatingpoint()     
+    to_dec_floatingpoint()
...
0.4   -ULP             0.4000000000000000        3.9999999999999997e-0
+01           0x1.9999999999999p-2       +0x1.9999999999999p-0002    +
+0d1.5999999999999999p-0002
0.4                    0.4000000000000000        4.0000000000000002e-0
+01           0x1.999999999999ap-2       +0x1.999999999999ap-0002    +
+0d1.6000000000000001p-0002
0.4   +ULP             0.4000000000000001        4.0000000000000008e-0
+01           0x1.999999999999bp-2       +0x1.999999999999bp-0002    +
+0d1.6000000000000003p-0002
[download]

With .16f, cannot tell the difference between 0.4-ULP and 0.4. However, in full scientific notation (.16e), you can.

0.5   -ULP             0.4999999999999999        4.9999999999999994e-0
+01           0x1.fffffffffffffp-2       +0x1.fffffffffffffp-0002    +
+0d1.9999999999999998p-0002
0.5                    0.5000000000000000        5.0000000000000000e-0
+01           0x1.0000000000000p-1       +0x1.0000000000000p-0001    +
+0d1.0000000000000000p-0001
0.5   +ULP             0.5000000000000001        5.0000000000000011e-0
+01           0x1.0000000000001p-1       +0x1.0000000000001p-0001    +
+0d1.0000000000000002p-0001
[download]

Format %.16f is sufficient for resolving differences between 0.5-ULP and 0.5 and 0.5+ULP.

...
4.0   -ULP             3.9999999999999996        3.9999999999999996e+0
+00           0x1.fffffffffffffp+1       +0x1.fffffffffffffp+0001    +
+0d1.9999999999999998p+0001
4.0                    4.0000000000000000        4.0000000000000000e+0
+00           0x1.0000000000000p+2       +0x1.0000000000000p+0002    +
+0d1.0000000000000000p+0002
4.0   +ULP             4.0000000000000009        4.0000000000000009e+0
+00           0x1.0000000000001p+2       +0x1.0000000000001p+0002    +
+0d1.0000000000000002p+0002

4.1   -ULP             4.0999999999999988        4.0999999999999988e+0
+00           0x1.0666666666665p+2       +0x1.0666666666665p+0002    +
+0d1.0249999999999997p+0002
4.1                    4.0999999999999996        4.0999999999999996e+0
+00           0x1.0666666666666p+2       +0x1.0666666666666p+0002    +
+0d1.0249999999999999p+0002
4.1   +ULP             4.1000000000000005        4.1000000000000005e+0
+00           0x1.0666666666667p+2       +0x1.0666666666667p+0002    +
+0d1.0250000000000001p+0002
[download]

At 4.0,rounding would show "4.000000000000000", "4.000000000000000", and "4.000000000000001" for %.15f, so you still need 16 digits of precision. But at 4.1, it would show "4.099999999999999", "4.100000000000000", and "4.100000000000001", so %.15f would be sufficient.

...
10.0  -ULP             9.9999999999999982        9.9999999999999982e+0
+00           0x1.3ffffffffffffp+3       +0x1.3ffffffffffffp+0003    +
+0d1.2499999999999998p+0003
10.0                  10.0000000000000000        1.0000000000000000e+0
+01           0x1.4000000000000p+3       +0x1.4000000000000p+0003    +
+0d1.2500000000000000p+0003
10.0  +ULP            10.0000000000000020        1.0000000000000002e+0
+01           0x1.4000000000001p+3       +0x1.4000000000001p+0003    +
+0d1.2500000000000002p+0003

128.0 -ULP           127.9999999999999900        1.2799999999999999e+0
+02           0x1.fffffffffffffp+6       +0x1.fffffffffffffp+0006    +
+0d1.9999999999999998p+0006
128.0                128.0000000000000000        1.2800000000000000e+0
+02           0x1.0000000000000p+7       +0x1.0000000000000p+0007    +
+0d1.0000000000000000p+0007
128.0 +ULP           128.0000000000000300        1.2800000000000003e+0
+02           0x1.0000000000001p+7       +0x1.0000000000001p+0007    +
+0d1.0000000000000002p+0007

1024.0-ULP          1023.9999999999999000        1.0239999999999999e+0
+03           0x1.fffffffffffffp+9       +0x1.fffffffffffffp+0009    +
+0d1.9999999999999998p+0009
1024.0              1024.0000000000000000        1.0240000000000000e+0
+03          0x1.0000000000000p+10       +0x1.0000000000000p+0010    +
+0d1.0000000000000000p+0010
1024.0+ULP          1024.0000000000002000        1.0240000000000002e+0
+03          0x1.0000000000001p+10       +0x1.0000000000001p+0010    +
+0d1.0000000000000002p+0010

approx                            %30.16f                        %30.1
+6e                             %a         to_hex_floatingpoint()     
+    to_dec_floatingpoint()
[download]

Comment on Re^4: subtraction issue Select or Download Code

Replies are listed 'Best First'.
Re^5: subtraction issue by syphilis (Archbishop) on Jan 28, 2017 at 01:47 UTC
it takes n fixed-point decimal digits to exactly represent an n-bit fixed-point fractional binary number Thank you for correcting me ... but shouldn't that be "n+1 fixed-point decimal digits" ? `use strict; use warnings; use Math::MPFR qw(:mpfr); Rmpfr_set_default_prec(1000); my $d = 1 / 11; my $x = Math::MPFR->new($d); print "$x\n", length($x), "\n"; # (print() removes trailing zeros.) # Outputs: 9.09090909090909116141432377844466827809810638427734375e-2 58` [download] Take out the "." and the "e-2" and we're left with 54 decimal digits. Cheers, Rob	[reply] [d/l]
Re^6: subtraction issue by pryrt (Abbot) on Jan 28, 2017 at 03:36 UTC
That's confusing to me, because the math says that for a fraction that's exactly representable with d digits after the binary point, f = m * 2*-d = m 5d / (5d * 2*d) = m 5d / 10d, which is exactly representable with d digits after the decimal point, because m * 5d is an integer, and an integer over 10d only needs d digits after the decimal point: 17 / 10*5 = 0.00017. I will try to find some time this weekend to see what's going on. 1/11 isn't exactly representable in binary, but assuming that mpfr is using the exact value of the double precision, I would think it would fit in the 52 digits after the point. Oh, wait! The 52 is actually the digits of the fraction of the mantissa after removing enough powers of 2 to get the mantissa between 1 and 2. So internally, it would be 1.4545... 2**-4. So we've got 52 binary digits after the double floating point, plus four more digits due the power of two, which makes d = 58: the 54 SIG figs you show, plus the 2 zeroes from e-2, plus apparently 2 zeroes after the final 5 shown. (This would have been so much easier if I were at a computer with perl, not just my tablet and my emulator in my skull...)	[reply]
Re^7: subtraction issue by syphilis (Archbishop) on Jan 28, 2017 at 06:59 UTC
... assuming that mpfr is using the exact value of the double precision ... It is - unless there's a bug at work. which makes d = 58 Here's a couple of values that, according to mpfr, require 60 decimal digits: `1 / 526 1.90114068441064637711435114653113487293012440204620361328125e-3 1 / 535 1.86915887850467297461032334382480257772840559482574462890625e-3` [download] Might be time I stopped fiddling and started thinking. (Have you got a spare one of those in-skull emulators that you could send over ?) Update: Fiddling is much easier, especially on a Sat'dy night. I've found numerous doubles that apparently require 122 decimal digits for exact representation, but haven't found any that require more than 122. For example: `25 / (8360484 ** 15) 3.66833261843476555514935702258199343058161821865425601769198458144384 +41059633175331427349000780680559003454762069283595612e-103` [download] Cheers, Rob	[reply] [d/l] [select]
Re^8: subtraction issue by pryrt (Abbot) on Jan 28, 2017 at 18:30 UTC
Re^9: subtraction issue by syphilis (Archbishop) on Jan 28, 2017 at 22:34 UTC
Some notes below your chosen depth have not been shown here
Re^8: subtraction issue by BrowserUk (Patriarch) on Jan 28, 2017 at 13:45 UTC
Re^9: subtraction issue by syphilis (Archbishop) on Jan 28, 2017 at 23:10 UTC
Some notes below your chosen depth have not been shown here