in reply to Re^5: Need more precision.
in thread Need more precision.

I'm a little confused about the latter suggestion, since I'm not sure integer operations would "sort out" floating point stuff,

That's my understanding also; but I was hoping ... :)

I have native 64-bit ints; so combining two of those to give me fixed point would give huge precision; no need to move to two 128-bit integers.

But I'm fuzzy on the implementation of fixed point floats using two ints. If you come across any pointers/notes/implementations I'd love to hear of it.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

Replies are listed 'Best First'.
Re^7: Need more precision.
by Anonymous Monk on Jun 10, 2015 at 00:05 UTC

    My current approach is to think about it in two steps: First, if 128-bit ints are not supported natively, one needs routines to handle normal arithmetic operations on those (so you don't need to worry whether they are 2x 64-bit ints or even 4x 32-bit ints). That shouldn't be too difficult (it'd be the same basic idea as doing 16-bit math on an 8-bit uC), plus there seem to be a couple of libraries out there. E.g. Boost apparently has a int128_t and bigger (http://www.boost.org/doc/libs/1_53_0/libs/multiprecision/doc/html/boost_multiprecision/tut/ints/cpp_int.html)... Math::Int128 apparently tries to use the native 128-bit ints, which as far as I understand from what I've read so far are not "officially" supported by MS, which would be my guess as to why the module has some CPAN Testers failures on Windows.

    Second, use those routines to implement the fixed-point stuff. Addition and subtraction is pretty trivial; multiplication would either require a temporary 256-bit integer or some rounding; still doing research on the latter... you've piqued my curiosity :-)

      you've piqued my curiosity :-)

      Update: I forgot the package statements. D'oh! (Code updated)

      This is what I have so far. Not working yet because of an IC problem that I've encountered before, but can't remember how to fix; and the math is almost certainly wrong as is; but I threw it together to try and get a starting point:

      #! perl -slw use Config; package F128; use Inline C => Config => BUILD_NOISY => 1; use Inline C => <<END_C, NAME => 'F128', CLEAN_AFTER_BUILD =>0, TYPEM +APS => './F128.typemap'; #include "../C/mytypes.h" #define CLASS "F128" typedef struct { U64 scale; U64 precision; } F128; F128 *new( I64 scale, U64 precision ) { F128 *n = malloc( sizeof( F128 ) ); n->scale = scale; n->precision = precision; return n; } F128 *add( F128 *a, F128 *b ) { if( a->scale != b->scale ) { while( a->scale > b->scale ) { a->scale >>= 1; a->precision <<= 1; } while( a->scale < b->scale ) { a->scale <<= 1; a->precision >>= 1; } } a->precision += b->precision; return a; } F128 *multiply( F128 *a, F128 *b ) { a->precision *= b->precision; a->scale += b->scale; return a; } SV *toString( F128 *a ) { char *out = malloc( a->scale ); U32 c = sprintf( out, "%I64ue%I64i", a->precision, a->scale ); return newSVpv( out, c ); } void DESTROY( F128 *a ) { free( a ); return; } END_C package main; my $Fa = F128::new( -10, 45 ); my $Fb = F128::new( -10, 45 ); my $Fc = F128::new( -10, 45 ); print $Fa->toString; $Fa->add( $Fb ); print $Fa->toString; $Fc->multiply( $Fb ); print $Fc->toString;

      The typemap:

      TYPEMAP const char * T_PV F128 * O_OBJECT U64 T_UV I64 T_IV U8 T_UV U8 * T_PV INPUT O_OBJECT if( sv_isobject($arg) && ( SvTYPE( SvRV($arg) ) == SVt_PVMG ) ) $var = INT2PTR( $type, SvIV( (SV*)SvRV( $arg ) ) ); else{ warn( \"${Package}::$func_name() -- $var is not a blessed +SV reference\" ); XSRETURN_UNDEF; } OUTPUT # The Perl object is blessed into 'CLASS', which should be a # char* having the name of the package for the blessing. O_OBJECT sv_setref_pv( $arg, (char *)CLASS, (void*)$var );

      I'm currently getting:

      45e-10 90e-10 2025e-20

      Which is almost there for my requirements.

      And I know its a trivial fix, but .... I can't remember what. syphilis Got your ears on?.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
      In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

        It's been a really long time since I last worked with fixed point arithmetic, so I decided to first do a little refresher on that:

        https://www.youtube.com/watch?v=S12qx1DwjVk https://www.youtube.com/watch?v=npQF28g6s_k https://www.youtube.com/watch?v=7pkXlcapNB4

        ... and now I've run out of time and will have to get back to playing with it later :-/ But the above essentially confirms what I was thinking earlier: convert your floats in the range -4 to 4 into 128-bit ints with a scale of something like 1/2**124; addition and subtraction remain trivial, multiplication either requires a 256-bit output, or you'll have to shift your two multiplicands first and lose some precision.

        Your approach is certainly different from that fixed-point approach, but mostly due to lack of time in comprehending it I'm not yet sure that I see where you are going with it other than implementing your own 128-bit floating point numbers?

        BTW, in my research I also came across Math::Decimal128 and Math::Float128, although I'm not sure if those will help you in MSVC...