http://qs1969.pair.com?node_id=483339


in reply to Re^4: Algorithm for cancelling common factors between two lists of multiplicands
in thread Algorithm for cancelling common factors between two lists of multiplicands

Sorry I did not mean to say you did not attribute the idea, I was just curious because you mentioned about implementation

I don't think I explained it very clearly. The idea I was trying to convey was if you have list of factorials then we should be able to find the besy way to subract them

Let's consider your example

a b c d --------- v w x y z Suppose you sort the numerator and denominator separately you might ge +t - d c b a --------- z y x w v i.e. d > c > b > a and z > y > x > w > v.
Under this situation subracting  d & z, c & y, b & x, a & w and leaving v as is will be the best ordering possible or in other words, there will not be a better subtraction process that will give us fewer total number of elements.

I left at subract (d & z) because (d can > z) or (d can be < z) so it will either go into the numerator or denomiator. But that's a simple logic to check and assign correctly

What i set out to prove was just that i.e. if you sort the lists and subract the like indices then you are gauranteed (?yet to be proved rigourously) to get the least possible number of elements after one round of cancellation! I do not think I was able to prove it in a sophisticated way but it seems to be suggesting it is correct...maybe it will be easier to run some test cases to actually check if the conjecture is true :)

If the above conjecture is true or at least intuitive then there is no ambiguity in the way subtraction should be done!

cheers

SK

Replies are listed 'Best First'.
Re^6: Algorithm for cancelling common factors between two lists of multiplicands (192.5 ms)
by BrowserUk (Patriarch) on Aug 13, 2005 at 01:53 UTC

    I never doubted the idea, nor your proof, I simply had real trouble coding it. There's a load of niggly edge cases that I could not seem to get right--but now I have.

    As a result, the original testcase that took M::BF 4 1/2 hours and that M::Pari couldn't handle, I now have down to 192 milliseconds in pure Perl! What's more, it happily handles a dataset of (1e6 2e6 2e6 1e6) without blowing the memory in just over 1 minute (though I cannot check the accuracy as I have nothing else that will touch those numbers).

    Of course, I've given up a fair amount of precision along the way, so now I am wondering if I can enhance my cheap BigFloat code to recover some of it?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
      As a precision reference, here's the 1e5/2e5 answer computed with my program. It ought to be good for 60 digits:
      [thor@redwood fishers-exact-test]$ cat fetbig.dat 100000 200000 200000 100000 [thor@redwood fishers-exact-test]$ time ./fet <fetbig.dat +1.324994596496999433060120009448386330459655228210366819887079e-14760 real 21m8.445s user 20m56.788s sys 0m6.831s
      It looks like your FET6 yields about 13 digits of precision for this case:
      | 1.32499459649680040e-14760 +1.324994596496999433060120009448386330459655228210366819887079e-14760 |
      Here is my implementation and results with benchmark. It is 45-50% faster. I am not sure what it the exact digit at which the result becomes unreliable so it could be slightly off

      My perl skills are not that great so my implementation is very simple and straightforward

      tmoertel's haskell implementation amazes me on the precision! I wonder if that is specific to Haskell or the way it was coded

      My implementation

        tmoertel's haskell implementation amazes me on the precision! I wonder if that is specific to Haskell or the way it was coded
        The reason the implementation has such precision is not because it is written in Haskell – although that does make coding more convenient – but because it computes the answer exactly as a ratio of bigints and then extracts decimal digits until the desired precision has been obtained. If you want more or fewer than sixity digits, for example, just change 60 to the desired number in the sciformat 60 call in the code in Re^13: Algorithm for cancelling common factors between two lists of multiplicands.

        Cheers,
        Tom

        I think most (though not all), of your gain is through avoiding the overhead of calling subroutines in loops. Inlining is the last step when trying squeeze out the very last drop of performance. That relates back to my opinion on Perl 5's greatest limitation is...?.

        tmoertel's haskell implementation amazes me on the precision!

        The precision comes from using Haskell's infinite precision Integer data type (analogous to Math::BigInt) for the product and division calculations, once the cancelling has been carried out.

        The performance of that precision comes from the compiled implementation and a highly optimising compiler. The only reason the Perl code manages to beat the compiled Haskell is because it uses the much lower precision, FPU native double representation.

        You might find the following code interesting, it's a brute force conversion of Tom's Haskell code staying as close to the original as I could achieve. It works okay for the small dataset, but highlights two major differences between Perl and Haskell.

        The highly recusive nature of the algorithm exacerbates the cost of Perl's subcalls and lack of tail recursion.

        And if you try to use it on the larger datasets, you'll see that the Perl implementation consumes vast amounts of memory (eventually segfaulting on my machine when it runs out of swap space). Most of the damage is done in building and rebuilding zillions of lists whilst keeping copies of earlier versions on the stack in the merge & cancel functions. It's this type of recursive, divide & copy, list processing that Haskell's ubiquitous lazy operations really excel at.

        Not particularly useful given it limitations, but an interesting exercise none the less.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.