http://qs1969.pair.com?node_id=483185


in reply to Re^2: Algorithm for cancelling common factors between two lists of multiplicands
in thread Algorithm for cancelling common factors between two lists of multiplicands

I am not sure if you had a chance to look at my node Re^7: Algorithm for cancelling common factors between two lists of multiplicands

I guess my approach is same as Limbic~Region as I was using subtraction of lower factorial terms with higher ones.

I copied the input list from one of your previous examples and I have  45700 instead of 4570 but aside from that the implementation should be easy to understand.

I guess if you sort numerator and denominator and subtract them you should be all set. I have not proved this formally but here is a stab at a simple proof that shows sorting and subtracting should work...

Let the fraction be

X! Y! Z! ------- a! b! c! WLOG let's assume that X > Y > Z and a > b > c. Also let's assume that + b > Y , X > a (weaker assumption: Z > c)...<p> NOTE: if we divide X!/a! we will have (X-a) elements <p> To Prove: (X-a) + (Z-c) + (b-Y) is the shortest list one can find or in other words (X-a) + (Z-c) + (b-Y) <= (X-p) + (Z-q) + (r-Y) for any p,q,r in permut +ation(a,b,c) Proof: From the above equation since b > Y and Z > c, r should be equal to ei +ther a or b. If r = b then the solution is trivial<p> If r = a then we get (X-a) + (Z-c) + (b-Y) ?<= (X-b) + (Z-c) + (a-Y) canceling terms -a - c + b ?<= -b -c + a -a + b ?<= -b + a ====> YES since a > b we see that r = a is not the smallest list so r = b<p> Similarly we can also show that (X-a) + (Z-c) + (b-Y) <= (X-a) + (Y-c) + + (b-Z)
I don't think this is a rigourous proof this method but i sort of feel sorting and subtracting should give us what we need...

cheers

SK

PS: I think there will be 47448 elements and not 47444 as you suggested? as you need to count the first element too..

Replies are listed 'Best First'.
Re^4: Algorithm for cancelling common factors between two lists of multiplicands
by BrowserUk (Patriarch) on Aug 12, 2005 at 12:19 UTC

    Sorry sk. I try to always attribute ideas, but it's been a long thread, several days and I've had other conversations beyond the thread. I may have misremembered the sequence of things.

    For the algorithm, the problem is the implementation not the sort'n'subtract. With 4 values on top and 5 underneath, there are numerous ways in which the subtraction needs to be done. Ie.

    a b c d (b-w)(d-y) (a-v)(c-x) 1 --------- might give ----------- or ----------- or ------------------- +-- or ... v w x y z (v-a)(x-c)z (w-b)(y-d)z (v-a)(w-b)(x-c)(y-d +)z

    And for low numbers, the time spent ordering and eliminating outweigths the benefits.

    It's just a case of coding the mechanism in an economic fashion.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
      For the algorithm, the problem is the implementation not the sort'n'subtract. With 4 values on top and 5 underneath, there are numerous ways in which the subtraction needs to be done.
      That's why I simply enumerated each factorial's terms in increasing order and merged the streams, canceling in passing. It was a heck of a lot easier than writing a fast, robust representation of intervals, and yet this crude implementation is "fast enough": Less than 17 percent of my code's time is spent on this task (see profile report, below). Were I to eliminate this time entirely, I would gain only a one-fifth improvement in overall speed, and that fact suggests that this line of optimization has petered out.

      Interesting tidbit: The entire pre-multiplication pipleline accounts for only 6 percent of the overall allocation cost, suggesting that the lazy build-and-merge-and-cancel operation works in near-constant space.

      Cheers,
      Tom

      individual inherited COST CENTRE entries %time %alloc %time %alloc MAIN 0 0.0 0.0 100.0 100.0 main 1 0.0 0.0 90.5 82.4 pCutoff 1 0.0 0.0 90.5 82.4 toRatio 1 16.7 0.0 73.8 76.4 bigProduct 2 57.1 76.3 57.1 76.3 rpCutoff 1 0.0 0.0 16.7 6.0 <--- here rdivide 1 0.0 0.0 0.0 0.0 rtimes 1 0.0 0.0 0.0 0.0 rtimes 0 0.0 0.0 4.8 0.7 rdivide 1 0.0 0.0 4.8 0.7 cancel 113163 4.8 0.7 4.8 0.7 merge 2 0.0 0.0 0.0 0.0 facproduct 2 0.0 0.0 11.9 5.3 fac 9 4.8 2.8 4.8 2.8 rproduct 2 0.0 0.0 7.1 2.5 rtimes 9 0.0 0.0 7.1 2.5 cancel 9 0.0 0.0 0.0 0.0 merge 294870 7.1 2.5 7.1 2.5 CAF 1 0.0 0.0 9.5 17.6 sciformat 3 0.0 0.0 9.5 17.6 tosci 7031 9.5 17.4 9.5 17.6 digits 122 0.0 0.2 0.0 0.2 Note: All pre-multiplication logic, including merging and canceling, is contained within the inherited time marked "here".
      Sorry I did not mean to say you did not attribute the idea, I was just curious because you mentioned about implementation

      I don't think I explained it very clearly. The idea I was trying to convey was if you have list of factorials then we should be able to find the besy way to subract them

      Let's consider your example

      a b c d --------- v w x y z Suppose you sort the numerator and denominator separately you might ge +t - d c b a --------- z y x w v i.e. d > c > b > a and z > y > x > w > v.
      Under this situation subracting  d & z, c & y, b & x, a & w and leaving v as is will be the best ordering possible or in other words, there will not be a better subtraction process that will give us fewer total number of elements.

      I left at subract (d & z) because (d can > z) or (d can be < z) so it will either go into the numerator or denomiator. But that's a simple logic to check and assign correctly

      What i set out to prove was just that i.e. if you sort the lists and subract the like indices then you are gauranteed (?yet to be proved rigourously) to get the least possible number of elements after one round of cancellation! I do not think I was able to prove it in a sophisticated way but it seems to be suggesting it is correct...maybe it will be easier to run some test cases to actually check if the conjecture is true :)

      If the above conjecture is true or at least intuitive then there is no ambiguity in the way subtraction should be done!

      cheers

      SK

        I never doubted the idea, nor your proof, I simply had real trouble coding it. There's a load of niggly edge cases that I could not seem to get right--but now I have.

        As a result, the original testcase that took M::BF 4 1/2 hours and that M::Pari couldn't handle, I now have down to 192 milliseconds in pure Perl! What's more, it happily handles a dataset of (1e6 2e6 2e6 1e6) without blowing the memory in just over 1 minute (though I cannot check the accuracy as I have nothing else that will touch those numbers).

        Of course, I've given up a fair amount of precision along the way, so now I am wondering if I can enhance my cheap BigFloat code to recover some of it?


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.