in reply to Re^6: Math::BigFloat to native double?
in thread Math::BigFloat to native double?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^8: Math::BigFloat to native double?
by syphilis (Archbishop) on Jul 13, 2015 at 10:07 UTC | |
Ok ... I thought my perl script was printing out the decimal values to full precision, but it wasn't. Corrected decimal representations are: They still don't add up to something near the original input value - but nor should they. Whilst it's quite valid to simply add (concatenate) base 2 or base 16 values, if we want to add in base 10, we need to first convert those hex values to *106* bit precision decimal values. Expressed as decimals to 106 bits of precision, I get: The sum of which is: 3.14159265358979323846264338327953 (This is the same as the input value, except for the extra "3" digit at the end.) I was curious to see the actual hex values that Buk's script was producing so, on perl 5.22.0 (which provides "%a" formatting), I ran: which outputs (when run with the "-l" switch): So the most significant double agrees with my ppc box, but the value of the least significant double differs. Not so sure that splitting the values on the basis of the Math::BigFloat values can provide correct 106-bit values. Cheers, Rob | [reply] [d/l] [select] |
by BrowserUk (Patriarch) on Jul 13, 2015 at 15:44 UTC | |
They still don't add up to something near the original input value - but nor should they. ... if we want to add in base 10, we need to first convert those hex values to *106* bit precision decimal values. Okay. That makes no sense to me at all. Given this comes from you, and is math, I pretty much know I'm wrong here. But ... And now I'm wondering if your hardware double-double implementation is the same thing as the double-double I was referring to. Which in summary, uses pairs of doubles, to achieve greater precision by splitting the values into two parts and storing the hi part and the lo part in the pair. Thus, when the two parts are added together, they form the complete value. Now, the two parts cannot have more than 53-bits of precision each; so the idea that "if we want to add in base 10, we need to first convert those hex values to *106* bit precision decimal values." doesn't make sense to me. Where does the extra precision for each of the two values, before combining them, come from? Half of my brain is saying: this is the classic binary representation of decimal values thing; but in reverse. Not so sure that splitting the values on the basis of the Math::BigFloat values can provide correct 106-bit values And the other half is saying: M::BF may not be the fastest thing in the arbitrary precision world; but it is arbitrary precision, so surely when it gets there, the results are accurate? By now you're probably slapping your forehead saying: "Why doesn't he just install Math::MPFR!" And the answer is, I don't want an arbitrary precision or multi-precision library. I'm only using M::BF because it was on my machine and a convenient (I thought) way to test a couple of things to do with my own implementation of the double-double thing (per the paper I linked). One of which is to write my own routines for outputting in hex-float format (done) and converting to decimal. I was working on the latter when I need to generate the binary decimal constants and here I am. So why not just use someone else's library? Oooh! I feel so much better for having got that off my chest :) I realise that I've probably lost your patronage for my endeavour in the process; but so be it. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!
| [reply] |
by syphilis (Archbishop) on Jul 14, 2015 at 03:45 UTC | |
It was a poorly expressed explanation. The most significant double is 0x1.921fb54442d18p+1. (We agree on this, at least.) That string expresses an exact value, but 3.1415926535897931 is only a very poor approximation of that exact value. The least significant double is, according to me, 0x1.1a62633145c07p-53. That string expresses an exact value, but 1.2246467991473532e-16 is only a very poor approximation of that exact value. So ... my doubledouble contains a value that is exactly the sum of both: 0x1.921fb54442d18p+1 + 0x1.1a62633145c07p-53 But you can't expect the sum of the 2 rough decimal approximations to be equivalent to the exact hex sum of the 2 hex numbers. (And it's not, of course.) The least significant double that your approach came up with was 0x1.3f45eb146ba31p-53. Your decimal approximation of that exact value is 0.0000000000000001384626433832795. When you add your 2 decimal approximations together you end up with the input value - but that's false comfort. The actual value contained by your doubledouble is, according to the way I look at them, is not really the sum of the 2 decimal approximations - it's the sum of the 2 hex values: 0x1.921fb54442d18p+1 + 0x1.3f45eb146ba31p-53. That corresponds to a base 10 value of 3.141592653589793254460606851823683 (which differs significantly from the input). Your doubledouble, expressed in base 2 is: 11.001001000011111101101010100010001000010110100011000010011111101000101111010110001010001101011101000110001 which is not the correct 107 bit representation of the input value. If you use that doubledouble as your pi approximation, then you'll only get incorrect results. FWIW, if I want to calculate the double-double representation of a specific value, I do essentially the same as you, except that I use Math::MPFR instead of Math::BigFloat. And I set precision to 2098 bits. I have: 2098 bits is overkill for the vast majority of conversions. It stems from the fact that the doubledouble can accurately represent certain values up to (but not exceeding) 2098 bits. For example, on my PPC box I can assign $x = (2 **1023) + (2 ** -1074); and the doubledouble $x will consist of the 2 doubles 2**1023 and 2**-1074, thereby accurately representing that 2098 bit value. The value of this capability is limited - multiply $x by (say) 1.01 and all of that additional precision is lost. The result is the same as multiplying 2**1023 by 1.01. Anyway ... first question is "how to set your doubledouble pi correctly using Math::BigFloat ?". I couldn't quickly come up with an answer, but I'll give it some more thought later on today. Cheers, Rob | [reply] [d/l] [select] |
by BrowserUk (Patriarch) on Jul 14, 2015 at 05:11 UTC | |
by syphilis (Archbishop) on Jul 14, 2015 at 13:14 UTC | |
| |
|
Re^8: Math::BigFloat to native double?
by BrowserUk (Patriarch) on Jul 13, 2015 at 09:19 UTC | |
This is what I get by adding up all the binary fractions for all the set bits in your hex/binary representation:
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!
| [reply] [d/l] |
by syphilis (Archbishop) on Jul 13, 2015 at 12:14 UTC | |
Following, are the values yielded by Math::MPFR. It's a 107-bit string, so I've capped the decimal values I've used to represent 107 bit values. There's perhaps an argument for sticking to 106-bit precision, and ignoring the 107th bit - but, in any case, it makes only a minute difference to the calculations. Here are the calculations I've got: It overstates the value by only 6e-33. And the script that I used: Cheers, Rob | [reply] [d/l] [select] |