in reply to Re^6: Math::BigFloat to native double?
in thread Math::BigFloat to native double?

Something is not right

Yeah - my hex values are correct, but their decimal approximations look decidedly suspect.
(I'll follow up as soon as I've double-checked.)

Cheers,
Rob

Replies are listed 'Best First'.
Re^8: Math::BigFloat to native double?
by syphilis (Archbishop) on Jul 13, 2015 at 10:07 UTC
    I'll follow up as soon as I've double-checked

    Ok ... I thought my perl script was printing out the decimal values to full precision, but it wasn't.

    Corrected decimal representations are:
    0x1.921fb54442d18p+1 => 3.1415926535897931 0x1.1a62633145c07p-53 => 1.2246467991473532e-16
    They still don't add up to something near the original input value - but nor should they.
    Whilst it's quite valid to simply add (concatenate) base 2 or base 16 values, if we want to add in base 10, we need to first convert those hex values to *106* bit precision decimal values.
    Expressed as decimals to 106 bits of precision, I get:
    0x1.921fb54442d18p+1 => 3.14159265358979311599796346854419 0x1.1a62633145c07p-53 => 1.2246467991473532071737640294584e-16
    The sum of which is:
    3.14159265358979323846264338327953
    (This is the same as the input value, except for the extra "3" digit at the end.)

    I was curious to see the actual hex values that Buk's script was producing so, on perl 5.22.0 (which provides "%a" formatting), I ran:
    use Math::BigFloat;; $n = Math::BigFloat->new( '3.1415926535897932384626433832795' );; $d = 0 + $n->bstr;; printf "%.17f\n", $d;; printf "%a\n", $d;; $bfd = Math::BigFloat->new( sprintf "%.17f", $d );; print $bfd;; $n -= $bfd;; print $n;; printf "%a\n", $n;;
    which outputs (when run with the "-l" switch):
    3.14159265358979310 0x1.921fb54442d18p+1 3.1415926535897931 0.0000000000000001384626433832795 0x1.3f45eb146ba31p-53
    So the most significant double agrees with my ppc box, but the value of the least significant double differs.
    Not so sure that splitting the values on the basis of the Math::BigFloat values can provide correct 106-bit values.

    Cheers,
    Rob
      They still don't add up to something near the original input value - but nor should they. ... if we want to add in base 10, we need to first convert those hex values to *106* bit precision decimal values.

      Okay. That makes no sense to me at all.

      Given this comes from you, and is math, I pretty much know I'm wrong here. But ...

      And now I'm wondering if your hardware double-double implementation is the same thing as the double-double I was referring to.

      Which in summary, uses pairs of doubles, to achieve greater precision by splitting the values into two parts and storing the hi part and the lo part in the pair. Thus, when the two parts are added together, they form the complete value.

      Now, the two parts cannot have more than 53-bits of precision each; so the idea that "if we want to add in base 10, we need to first convert those hex values to *106* bit precision decimal values." doesn't make sense to me.

      Where does the extra precision for each of the two values, before combining them, come from?

      Half of my brain is saying: this is the classic binary representation of decimal values thing; but in reverse.

      Not so sure that splitting the values on the basis of the Math::BigFloat values can provide correct 106-bit values

      And the other half is saying: M::BF may not be the fastest thing in the arbitrary precision world; but it is arbitrary precision, so surely when it gets there, the results are accurate?

      By now you're probably slapping your forehead saying: "Why doesn't he just install Math::MPFR!"

      And the answer is, I don't want an arbitrary precision or multi-precision library. I'm only using M::BF because it was on my machine and a convenient (I thought) way to test a couple of things to do with my own implementation of the double-double thing (per the paper I linked).

      One of which is to write my own routines for outputting in hex-float format (done) and converting to decimal. I was working on the latter when I need to generate the binary decimal constants and here I am.

      So why not just use someone else's library?

      1. My aversion to incorporating the kitchen sink when all I need is the plug.

        MFPR probably requires half dozen other libraries, all of which are probably monolithic monsters in their own right; and almost certainly won't compile using MSVC.

        I know you probably provide a PPM at your repository; but will it work with the version of MSVC I using. Probably not.

        So why not upgrade to a later version? Because the only version now available from MS won't install on Vista. So why not upgrade from Vista? Because it costs money. I probably will when I upgrade my motherboard and cpu sometime next year.

        So why not just use MingW? Because my clients don't use that; and because I don't like it.

      2. Ultimately, the intent is to convert the add, subtract and multiply routines that I need to assembler.

        Using, if and where possible, whatever MMX or SIMD instructions lend themselves to the purpose.

        The routines will be called from (Inline) C as the current double implementation is.

        The desire is to first get more precision (hence double-double), then more speed, by attempting to apply SIMD multiply-add to whole scan-lines of the mandelbrot set that are the background to all of this.

        So, ultimately, I don't need a Perl library (Math::MPFR); don't want the agony of trying build OSS libraries using MSVC; and won't achieve my goal by adding some huge pile of (clever) but (if typical of the breed) macro-hell, unintelligible code to my project.

      3. Finally, I'm obviously not understanding the subtleties of the way the restrictions of the machine representations are affecting the math.

        I mean; how hard can it be to add two numbers together, right. And yet; here I am struggling to understand how to add two numbers together.

        And that is all the reason -- and possibly the best reason -- for persisting in my goal of writing my own implementation of the double-double representation.

      Oooh! I feel so much better for having got that off my chest :) I realise that I've probably lost your patronage for my endeavour in the process; but so be it.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
      I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!
        That makes no sense to me at all

        It was a poorly expressed explanation.
        The most significant double is 0x1.921fb54442d18p+1. (We agree on this, at least.)
        That string expresses an exact value, but 3.1415926535897931 is only a very poor approximation of that exact value.
        The least significant double is, according to me, 0x1.1a62633145c07p-53.
        That string expresses an exact value, but 1.2246467991473532e-16 is only a very poor approximation of that exact value.
        So ... my doubledouble contains a value that is exactly the sum of both:
        0x1.921fb54442d18p+1 + 0x1.1a62633145c07p-53
        But you can't expect the sum of the 2 rough decimal approximations to be equivalent to the exact hex sum of the 2 hex numbers. (And it's not, of course.)

        The least significant double that your approach came up with was 0x1.3f45eb146ba31p-53.
        Your decimal approximation of that exact value is 0.0000000000000001384626433832795.
        When you add your 2 decimal approximations together you end up with the input value - but that's false comfort.
        The actual value contained by your doubledouble is, according to the way I look at them, is not really the sum of the 2 decimal approximations - it's the sum of the 2 hex values:
        0x1.921fb54442d18p+1 + 0x1.3f45eb146ba31p-53.
        That corresponds to a base 10 value of 3.141592653589793254460606851823683 (which differs significantly from the input).
        Your doubledouble, expressed in base 2 is:
        11.001001000011111101101010100010001000010110100011000010011111101000101111010110001010001101011101000110001
        which is not the correct 107 bit representation of the input value.
        If you use that doubledouble as your pi approximation, then you'll only get incorrect results.

        FWIW, if I want to calculate the double-double representation of a specific value, I do essentially the same as you, except that I use Math::MPFR instead of Math::BigFloat.
        And I set precision to 2098 bits. I have:
        # sub dd_str takes a string as its arg and returns # the 2 doubles that form the doubledouble. # Works correctly if default Math::MPFR precision is # 2098 bits - else might return incorrect values. sub dd_str { # Set $val to the 2098 bit representation of $_[0] my $val = Math::MPFR->new($_[0]); # Most significant double is $val converted to a # double, rounding to nearest with ties to even my $msd = Rmpfr_get_d($val, MPFR_RNDN); $val -= $msd; # Least siginificant double is $val converted # to a double. return ($msd, Rmpfr_get_d($val, MPFR_RNDN)); }
        2098 bits is overkill for the vast majority of conversions. It stems from the fact that the doubledouble can accurately represent certain values up to (but not exceeding) 2098 bits.
        For example, on my PPC box I can assign  $x = (2 **1023) + (2 ** -1074); and the doubledouble $x will consist of the 2 doubles 2**1023 and 2**-1074, thereby accurately representing that 2098 bit value.
        The value of this capability is limited - multiply $x by (say) 1.01 and all of that additional precision is lost. The result is the same as multiplying 2**1023 by 1.01.

        Anyway ... first question is "how to set your doubledouble pi correctly using Math::BigFloat ?".

        I couldn't quickly come up with an answer, but I'll give it some more thought later on today.

        Cheers,
        Rob
Re^8: Math::BigFloat to native double?
by BrowserUk (Patriarch) on Jul 13, 2015 at 09:19 UTC

    This is what I get by adding up all the binary fractions for all the set bits in your hex/binary representation:

    0 0.50000000000000000000000000000000000000000000000000 1 0.25000000000000000000000000000000000000000000000000 4 0.03125000000000000000000000000000000000000000000000 7 0.00390625000000000000000000000000000000000000000000 12 0.00012207031250000000000000000000000000000000000000 13 0.00006103515625000000000000000000000000000000000000 14 0.00003051757812500000000000000000000000000000000000 15 0.00001525878906250000000000000000000000000000000000 16 0.00000762939453125000000000000000000000000000000000 17 0.00000381469726562500000000000000000000000000000000 19 0.00000095367431640625000000000000000000000000000000 20 0.00000047683715820312500000000000000000000000000000 22 0.00000011920928955078125000000000000000000000000000 24 0.00000002980232238769531300000000000000000000000000 26 0.00000000745058059692382810000000000000000000000000 30 0.00000000046566128730773926000000000000000000000000 34 0.00000000002910383045673370400000000000000000000000 39 0.00000000000090949470177292824000000000000000000000 41 0.00000000000022737367544323206000000000000000000000 42 0.00000000000011368683772161603000000000000000000000 44 0.00000000000002842170943040400700000000000000000000 48 0.00000000000000177635683940025050000000000000000000 49 0.00000000000000088817841970012523000000000000000000 54 0.00000000000000002775557561562891400000000000000000 58 0.00000000000000000173472347597680710000000000000000 59 0.00000000000000000086736173798840355000000000000000 61 0.00000000000000000021684043449710089000000000000000 64 0.00000000000000000002710505431213761100000000000000 65 0.00000000000000000001355252715606880500000000000000 69 0.00000000000000000000084703294725430034000000000000 72 0.00000000000000000000010587911840678754000000000000 73 0.00000000000000000000005293955920339377100000000000 77 0.00000000000000000000000330872245021211070000000000 78 0.00000000000000000000000165436122510605530000000000 81 0.00000000000000000000000020679515313825692000000000 82 0.00000000000000000000000010339757656912846000000000 86 0.00000000000000000000000000646234853557052870000000 88 0.00000000000000000000000000161558713389263220000000 92 0.00000000000000000000000000010097419586828951000000 94 0.00000000000000000000000000002524354896707237800000 95 0.00000000000000000000000000001262177448353618900000 96 0.00000000000000000000000000000631088724176809440000 104 0.00000000000000000000000000000002465190328815661900 105 0.00000000000000000000000000000001232595164407830900 106 0.00000000000000000000000000000000616297582203915470 0.78539816439944831161566131939647068003696135548270 <<< 001124476675678989811791619118619999856552441020 00 0 0 01 0 0.78539816439944831161566131939647068003696135548270 * 4 = 3.14159265759779324646264527758588272014784542193080 3.1415926535897932384626433832795 <<<<<<<<<<<<<<<<<<<< (my) origin +al input. ^ ^^ ^^ ^^^^^^^^

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
    I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!
      From bit 24 on, you've understated the values.
      Following, are the values yielded by Math::MPFR. It's a 107-bit string, so I've capped the decimal values I've used to represent 107 bit values.
      There's perhaps an argument for sticking to 106-bit precision, and ignoring the 107th bit - but, in any case, it makes only a minute difference to the calculations.
      Here are the calculations I've got:
      C:\_32\pscrpt\math-mpfr>perl sum.pl 0 5e-1 1 2.5e-1 4 3.125e-2 7 3.90625e-3 12 1.220703125e-4 13 6.103515625e-5 14 3.0517578125e-5 15 1.52587890625e-5 16 7.62939453125e-6 17 3.814697265625e-6 19 9.5367431640625e-7 20 4.76837158203125e-7 22 1.1920928955078125e-7 24 2.98023223876953125e-8 26 7.450580596923828125e-9 30 4.656612873077392578125e-10 34 2.910383045673370361328125e-11 39 9.094947017729282379150390625e-13 41 2.27373675443232059478759765625e-13 42 1.136868377216160297393798828125e-13 44 2.8421709430404007434844970703125e-14 48 1.776356839400250464677810668945312e-15 49 8.881784197001252323389053344726562e-16 54 2.775557561562891351059079170227051e-17 58 1.734723475976807094411924481391907e-18 59 8.673617379884035472059622406959534e-19 61 2.168404344971008868014905601739883e-19 64 2.710505431213761085018632002174854e-20 65 1.355252715606880542509316001087427e-20 69 8.47032947254300339068322500679642e-22 72 1.058791184067875423835403125849552e-22 73 5.293955920339377119177015629247762e-23 77 3.308722450212110699485634768279851e-24 78 1.654361225106055349742817384139926e-24 81 2.067951531382569187178521730174907e-25 82 1.033975765691284593589260865087454e-25 86 6.462348535570528709932880406796585e-27 88 1.615587133892632177483220101699146e-27 92 1.009741958682895110927012563561966e-28 94 2.524354896707237777317531408904916e-29 95 1.262177448353618888658765704452458e-29 96 6.31088724176809444329382852226229e-30 104 2.465190328815661891911651766508707e-32 105 1.232595164407830945955825883254353e-32 106 6.162975822039154729779129416271767e-33 Sum = 7.853981633974483096156608458198765e-1 Sum * 4 = 3.141592653589793238462643383279506
      It overstates the value by only 6e-33.
      And the script that I used:
      use strict; use warnings; use Math::MPFR qw(:mpfr); # Set precision to precisely 107 bits Rmpfr_set_default_prec(107); # Set $bin_str to the 107-bit representation of # 3.1415926535897932384626433832795 my $bin_str = '1100100100001111110110101010001000100001011010001100001 +0001101001100010011000110011000101000101110000000111'; my $check = Math::MPFR->new(0); for(my $i = 0; $i < 107; $i++) { if(substr($bin_str, $i, 1)) { print "$i ", Math::MPFR->new(2 ** (-($i + 1))), "\n"; $check += 2 ** (-($i + 1)); } } print "\nSum =\n$check\nSum * 4 =\n", $check * 4, "\n";
      Cheers,
      Rob