It's been a really long time since I last worked with fixed point arithmetic, so I decided to first do a little refresher on that:
https://www.youtube.com/watch?v=S12qx1DwjVk https://www.youtube.com/watch?v=npQF28g6s_k https://www.youtube.com/watch?v=7pkXlcapNB4
... and now I've run out of time and will have to get back to playing with it later :-/ But the above essentially confirms what I was thinking earlier: convert your floats in the range -4 to 4 into 128-bit ints with a scale of something like 1/2**124; addition and subtraction remain trivial, multiplication either requires a 256-bit output, or you'll have to shift your two multiplicands first and lose some precision.
Your approach is certainly different from that fixed-point approach, but mostly due to lack of time in comprehending it I'm not yet sure that I see where you are going with it other than implementing your own 128-bit floating point numbers?
BTW, in my research I also came across Math::Decimal128 and Math::Float128, although I'm not sure if those will help you in MSVC...
In reply to Re^9: Need more precision. (Updated.)
by Anonymous Monk
in thread Need more precision.
by BrowserUk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |