in reply to Re^7: Need more precision.
in thread Need more precision.

you've piqued my curiosity :-)

Update: I forgot the package statements. D'oh! (Code updated)

This is what I have so far. Not working yet because of an IC problem that I've encountered before, but can't remember how to fix; and the math is almost certainly wrong as is; but I threw it together to try and get a starting point:

#! perl -slw use Config; package F128; use Inline C => Config => BUILD_NOISY => 1; use Inline C => <<END_C, NAME => 'F128', CLEAN_AFTER_BUILD =>0, TYPEM +APS => './F128.typemap'; #include "../C/mytypes.h" #define CLASS "F128" typedef struct { U64 scale; U64 precision; } F128; F128 *new( I64 scale, U64 precision ) { F128 *n = malloc( sizeof( F128 ) ); n->scale = scale; n->precision = precision; return n; } F128 *add( F128 *a, F128 *b ) { if( a->scale != b->scale ) { while( a->scale > b->scale ) { a->scale >>= 1; a->precision <<= 1; } while( a->scale < b->scale ) { a->scale <<= 1; a->precision >>= 1; } } a->precision += b->precision; return a; } F128 *multiply( F128 *a, F128 *b ) { a->precision *= b->precision; a->scale += b->scale; return a; } SV *toString( F128 *a ) { char *out = malloc( a->scale ); U32 c = sprintf( out, "%I64ue%I64i", a->precision, a->scale ); return newSVpv( out, c ); } void DESTROY( F128 *a ) { free( a ); return; } END_C package main; my $Fa = F128::new( -10, 45 ); my $Fb = F128::new( -10, 45 ); my $Fc = F128::new( -10, 45 ); print $Fa->toString; $Fa->add( $Fb ); print $Fa->toString; $Fc->multiply( $Fb ); print $Fc->toString;

The typemap:

TYPEMAP const char * T_PV F128 * O_OBJECT U64 T_UV I64 T_IV U8 T_UV U8 * T_PV INPUT O_OBJECT if( sv_isobject($arg) && ( SvTYPE( SvRV($arg) ) == SVt_PVMG ) ) $var = INT2PTR( $type, SvIV( (SV*)SvRV( $arg ) ) ); else{ warn( \"${Package}::$func_name() -- $var is not a blessed +SV reference\" ); XSRETURN_UNDEF; } OUTPUT # The Perl object is blessed into 'CLASS', which should be a # char* having the name of the package for the blessing. O_OBJECT sv_setref_pv( $arg, (char *)CLASS, (void*)$var );

I'm currently getting:

45e-10 90e-10 2025e-20

Which is almost there for my requirements.

And I know its a trivial fix, but .... I can't remember what. syphilis Got your ears on?.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

Replies are listed 'Best First'.
Re^9: Need more precision. (Updated.)
by Anonymous Monk on Jun 10, 2015 at 01:25 UTC

    It's been a really long time since I last worked with fixed point arithmetic, so I decided to first do a little refresher on that:

    https://www.youtube.com/watch?v=S12qx1DwjVk https://www.youtube.com/watch?v=npQF28g6s_k https://www.youtube.com/watch?v=7pkXlcapNB4

    ... and now I've run out of time and will have to get back to playing with it later :-/ But the above essentially confirms what I was thinking earlier: convert your floats in the range -4 to 4 into 128-bit ints with a scale of something like 1/2**124; addition and subtraction remain trivial, multiplication either requires a 256-bit output, or you'll have to shift your two multiplicands first and lose some precision.

    Your approach is certainly different from that fixed-point approach, but mostly due to lack of time in comprehending it I'm not yet sure that I see where you are going with it other than implementing your own 128-bit floating point numbers?

    BTW, in my research I also came across Math::Decimal128 and Math::Float128, although I'm not sure if those will help you in MSVC...

      Your approach is certainly different from that fixed-point approach, but mostly due to lack of time in comprehending it I'm not yet sure that I see where you are going with it other than implementing your own 128-bit floating point numbers?

      Um. I'm just playing with it at the moment. Looking for the path of least resistance that allows me to get back to what I really want to be doing.

      I reasoned that as I have a 64-bit scale (it costs no more in computation time than using a byte; and I only have 6 of these numbers, so the extra space isn't an issue), it was easier to allow the scale to float a little than getting into doing "proper" fixed point and the need for a 128-bit register.

      That maybe short-sighted, but I've never written anything like this before, so I don't know what limitations I'm going to encounter. A new world to explore :)

      Maybe your links will educate me against it.

      Lot's to watch and read. Many thanks for your help.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
      In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

        So I'm still short on time but here is a proof-of-concept for fixed-point arithmetic with 128-bit ints in Perl I threw together based on the video tutorials. The same logic used would apply to any other language as well.

        I've used Math::Int128 as my 128-bit data type here, but if the target language doesn't support 128-bit ints natively, it should be possible to find a library (or implement one oneself) that abstracts out addition, subtraction, multiplication, and shifts on a custom 128-bit integer type (e.g. 2x 64-bit ints).

        Some notes on the code: The Math::Big* modules are for conversion to/from fixed-point only; the idea being that you first bring all your numbers into the fixed-point domain and then do all your work there. The scale of 1/2**124 means there are 4 bits before the decimal point, which is enough for the stated range of -4 to 4. The multiplication method implemented in mul_fixed is the "fast but less accurate" one (shift first, then multiply), because the most accurate one requires a 256-bit output (multiply first, then shift). As explained in the video tutorial, if the multiplicands are in a known range, it's possible to play with the shifting so that less accuracy is lost.

        #!/usr/bin/env perl use strict; use warnings; use Math::BigFloat; use Math::BigInt; use Math::Int128 qw(int128 int128_to_hex); use Math::Int128::die_on_overflow; $\ = $/; # https://www.youtube.com/watch?v=S12qx1DwjVk my $SCALE_BITS = 124; my $SCALE = Math::BigInt->new(2)->bpow($SCALE_BITS); print " Scale: 2**$SCALE_BITS = $SCALE"; sub number2fixed { # Math::Big* --> fixed-pt Math::Int128 return int128( shift->copy->bmul($SCALE)->as_int->bstr ); } sub fixed2number { # fixed-pt Math::Int128 --> Math::BigFloat return scalar Math::BigFloat->new(''.shift)->bdiv($SCALE); } my $EPSILON = fixed2number(int128(1)); print "Epsilon: ", $EPSILON->bstr; print "//// example: Pi"; my $pi1 = Math::BigFloat->bpi(45); print " orig: $pi1"; my $fpi = number2fixed($pi1); print "fixed: ", int128_to_hex($fpi); my $pi2 = fixed2number($fpi); print " conv: $pi2"; # https://www.youtube.com/watch?v=npQF28g6s_k sub mul_fixed { # 2x Math::Int128 --> Math::Int128 return (shift>>($SCALE_BITS/2)) * (shift>>($SCALE_BITS/2)); } print "//// properties of mul_fixed"; print "smallest possible multiplicand: ", fixed2number(int128(1)<<($SCALE_BITS/2)); # demo effect of the smallest possible multiplicand my $mul = number2fixed(Math::BigInt->new(3)); my $toosmall = number2fixed( Math::BigFloat->new("0.000000000000000000216")); print " too small: ", int128_to_hex($toosmall); print "\t* 3 = ", fixed2number(mul_fixed($mul,$toosmall)); my $justenough = number2fixed( Math::BigFloat->new("0.000000000000000000217") ); print "just enough: ", int128_to_hex($justenough); print "\t* 3 = ", fixed2number(mul_fixed($mul,$justenough)); print "//// test case: (45e-10)**2"; my $val = Math::BigFloat->new("0.0000000045"); print " input value: ", $val->bstr; my $fval = number2fixed($val); print " as fixed-pt: ", int128_to_hex($fval); my $p = mul_fixed($fval,$fval); print "fixed-pt mul: ", int128_to_hex($p); print " converted: ", fixed2number($p); print " expected: ", $val->copy->bpow(2);

        Output:

        Scale: 2**124 = 21267647932558653966460912964485513216 Epsilon: 0.00000000000000000000000000000000000004701977403289150031874 +946148888982711275 //// example: Pi orig: 3.14159265358979323846264338327950288419716940 fixed: 3243F6A8885A308D313198A2E0370734 conv: 3.141592653589793238462643383279502884184 //// properties of mul_fixed smallest possible multiplicand: 0.000000000000000000216840434497100886 +8014905601739883422852 too small: 00000000000000003FC07FA1F14C2C48 * 3 = 0 just enough: 0000000000000000400C0E7219869986 * 3 = 0.0000000000000000006505213034913026604044716805219650268555 //// test case: (45e-10)**2 input value: 0.0000000045 as fixed-pt: 00000001353CD652BB1674942F2A17B4 fixed-pt mul: 000000000000001758BEBD8531A08964 converted: 0.000000000000000020249999998198227269026294503547169656 +24 expected: 0.00000000000000002025

        In other words: You've got a little better than 36 decimal places after the point for storage, addition and subtraction. In multiplication, as implemented here, the inputs are currently limited to a bit more than 17 decimal places.