Your skill will accomplishwhat the force of many cannot PerlMonks

### Re^4: Algorithm for cancelling common factors between two lists of multiplicands

by BrowserUk (Patriarch)
 on Aug 12, 2005 at 12:19 UTC Need Help??

Sorry sk. I try to always attribute ideas, but it's been a long thread, several days and I've had other conversations beyond the thread. I may have misremembered the sequence of things.

For the algorithm, the problem is the implementation not the sort'n'subtract. With 4 values on top and 5 underneath, there are numerous ways in which the subtraction needs to be done. Ie.

a b c d (b-w)(d-y) (a-v)(c-x) 1 --------- might give ----------- or ----------- or ------------------- +-- or ... v w x y z (v-a)(x-c)z (w-b)(y-d)z (v-a)(w-b)(x-c)(y-d +)z

And for low numbers, the time spent ordering and eliminating outweigths the benefits.

It's just a case of coding the mechanism in an economic fashion.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
• Comment on Re^4: Algorithm for cancelling common factors between two lists of multiplicands

Replies are listed 'Best First'.
Re^5: Algorithm for cancelling common factors between two lists of multiplicands
by tmoertel (Chaplain) on Aug 12, 2005 at 16:40 UTC
For the algorithm, the problem is the implementation not the sort'n'subtract. With 4 values on top and 5 underneath, there are numerous ways in which the subtraction needs to be done.
That's why I simply enumerated each factorial's terms in increasing order and merged the streams, canceling in passing. It was a heck of a lot easier than writing a fast, robust representation of intervals, and yet this crude implementation is "fast enough": Less than 17 percent of my code's time is spent on this task (see profile report, below). Were I to eliminate this time entirely, I would gain only a one-fifth improvement in overall speed, and that fact suggests that this line of optimization has petered out.

Interesting tidbit: The entire pre-multiplication pipleline accounts for only 6 percent of the overall allocation cost, suggesting that the lazy build-and-merge-and-cancel operation works in near-constant space.

Cheers,
Tom

individual inherited COST CENTRE entries %time %alloc %time %alloc MAIN 0 0.0 0.0 100.0 100.0 main 1 0.0 0.0 90.5 82.4 pCutoff 1 0.0 0.0 90.5 82.4 toRatio 1 16.7 0.0 73.8 76.4 bigProduct 2 57.1 76.3 57.1 76.3 rpCutoff 1 0.0 0.0 16.7 6.0 <--- here rdivide 1 0.0 0.0 0.0 0.0 rtimes 1 0.0 0.0 0.0 0.0 rtimes 0 0.0 0.0 4.8 0.7 rdivide 1 0.0 0.0 4.8 0.7 cancel 113163 4.8 0.7 4.8 0.7 merge 2 0.0 0.0 0.0 0.0 facproduct 2 0.0 0.0 11.9 5.3 fac 9 4.8 2.8 4.8 2.8 rproduct 2 0.0 0.0 7.1 2.5 rtimes 9 0.0 0.0 7.1 2.5 cancel 9 0.0 0.0 0.0 0.0 merge 294870 7.1 2.5 7.1 2.5 CAF 1 0.0 0.0 9.5 17.6 sciformat 3 0.0 0.0 9.5 17.6 tosci 7031 9.5 17.4 9.5 17.6 digits 122 0.0 0.2 0.0 0.2 Note: All pre-multiplication logic, including merging and canceling, is contained within the inherited time marked "here".
Re^5: Algorithm for cancelling common factors between two lists of multiplicands
by sk (Curate) on Aug 12, 2005 at 16:57 UTC
Sorry I did not mean to say you did not attribute the idea, I was just curious because you mentioned about implementation

I don't think I explained it very clearly. The idea I was trying to convey was if you have list of factorials then we should be able to find the besy way to subract them

a b c d --------- v w x y z Suppose you sort the numerator and denominator separately you might ge +t - d c b a --------- z y x w v i.e. d > c > b > a and z > y > x > w > v.
Under this situation subracting  d & z, c & y, b & x, a & w and leaving v as is will be the best ordering possible or in other words, there will not be a better subtraction process that will give us fewer total number of elements.

I left at subract (d & z) because (d can > z) or (d can be < z) so it will either go into the numerator or denomiator. But that's a simple logic to check and assign correctly

What i set out to prove was just that i.e. if you sort the lists and subract the like indices then you are gauranteed (?yet to be proved rigourously) to get the least possible number of elements after one round of cancellation! I do not think I was able to prove it in a sophisticated way but it seems to be suggesting it is correct...maybe it will be easier to run some test cases to actually check if the conjecture is true :)

If the above conjecture is true or at least intuitive then there is no ambiguity in the way subtraction should be done!

cheers

SK

I never doubted the idea, nor your proof, I simply had real trouble coding it. There's a load of niggly edge cases that I could not seem to get right--but now I have.

As a result, the original testcase that took M::BF 4 1/2 hours and that M::Pari couldn't handle, I now have down to 192 milliseconds in pure Perl! What's more, it happily handles a dataset of (1e6 2e6 2e6 1e6) without blowing the memory in just over 1 minute (though I cannot check the accuracy as I have nothing else that will touch those numbers).

Of course, I've given up a fair amount of precision along the way, so now I am wondering if I can enhance my cheap BigFloat code to recover some of it?

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
As a precision reference, here's the 1e5/2e5 answer computed with my program. It ought to be good for 60 digits:
[thor@redwood fishers-exact-test]\$ cat fetbig.dat 100000 200000 200000 100000 [thor@redwood fishers-exact-test]\$ time ./fet <fetbig.dat +1.324994596496999433060120009448386330459655228210366819887079e-14760 real 21m8.445s user 20m56.788s sys 0m6.831s
It looks like your FET6 yields about 13 digits of precision for this case:
| 1.32499459649680040e-14760 +1.324994596496999433060120009448386330459655228210366819887079e-14760 |
Here is my implementation and results with benchmark. It is 45-50% faster. I am not sure what it the exact digit at which the result becomes unreliable so it could be slightly off

My perl skills are not that great so my implementation is very simple and straightforward

tmoertel's haskell implementation amazes me on the precision! I wonder if that is specific to Haskell or the way it was coded

My implementation

Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://483258]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (6)
As of 2024-04-25 11:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?

No recent polls found