http://qs1969.pair.com?node_id=482292


in reply to Algorithm for cancelling common factors between two lists of multiplicands

BrowserUk,
The order and pairings you choose to cancel out affect the result. This may prevent optimizations that seek to cancel out terms in a different order (such as largest terms first). For instance in your example, it took 6 cancelings to get the resut. Here is another possibility that doesn't result in prime factorization but only takes 3 pairings:
# After duplicates have been removed # @a = ( 20, 33, 60 ); # @b = ( 2, 5, 12, 16, 23 ); 60 / 12 = 5 / 1 # @a = ( 20, 33, 5 ); # @b = ( 2, 5, 16, 23 ); 20 / 5 = 4 / 1 # @a = ( 4, 33, 5 ); # @b = ( 2, 16, 23 ); 16 / 4 = 4 / 1 # @a = ( 33, 5 ); # @b = ( 2, 4, 23 );

My algorithm doesn't require prime factorization (just GCD).

#!/usr/bin/perl use strict; use warnings; use Inline 'C'; my (%set_a, %set_b); ++$set_a{$_} for 10, 20, 33, 45, 60; ++$set_b{$_} for 2, 5, 10, 12, 16, 23, 45; cancel_out(); # Assume %set_a and %set_b at package scope print "$_\t$set_a{$_}\n" for keys %set_a; print "\n\n\n"; print "$_\t$set_b{$_}\n" for keys %set_b; sub cancel_out { my $finished; while ( ! $finished ) { $finished = 1; for ( keys %set_a ) { if ( exists $set_b{$_} ) { my $res = $set_a{$_} <=> $set_b{$_}; if ( ! $res ) { delete $set_a{$_}, delete $set_b{$_} } elsif ( $res < 0 ) { $set_b{$_} -= delete($set_a{$_}); } else { $set_a{$_} -= delete($set_b{$_}); } } next if ! exists $set_a{$_}; my $m = $_; for my $n ( keys %set_b ) { my $gcd = gcd($m, $n); if ($gcd > 1 ) { ++$set_a{$m / $gcd} if $m != $gcd; ++$set_b{$n / $gcd} if $n != $gcd; ! --$set_a{$m} && delete $set_a{$m}; ! --$set_b{$n} && delete $set_b{$n}; $finished = 0, last; } } } } } __END__ __C__ /* Implementation of Euclid's Algorithm by fizbin */ int gcd(int m, int n) { while( 1 ) { if (n==0) {return m;} m %= n; if (m==0) {return n;} n %= m; } }

It doesn't produce the output you wanted so I didn't bother trying to optimize it WRT choosing pairings. With the lack of optimizations it is likely not that efficient - but it was fun.

Cheers - L~R

Replies are listed 'Best First'.
Re^2: Algorithm for cancelling common factors between two lists of multiplicands
by BrowserUk (Patriarch) on Aug 10, 2005 at 01:58 UTC
    It doesn't produce the output you wanted ...

    As I mentioned in 482012, the exact reduction isn't important. The results of your 3 manual steps (33,5)(2,4,23) and my 5 manual steps (33,5)(8,23) are completely equivalent.

    Adding a dependancy upon Inline::C is heavier than Math::Pari, though GCD could be written in perl of course.

    Your algorithm certainly works. However, using GCD requires multiple, O(m*n) passes over the datasets. hv's prime factors approach eliminates the multiple passes and seems, on the small sets I've tried, to get the results more quickly, even with pure perl factoring code.

    Coming up with an efficient, pure perl implementation of prime factorisation (limiting myself to integers 0 .. 2**32 ) is my current fun challenge :)


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
      BrowserUk,
      I misinterpreted "Other reductions possible" to read "Further reductions possible" given the example and where the commentary appeared. Admittedly I haven't read the entire thread and was under the incorrect impression for any given input there was only 1 acceptable output. I was also under the incorrect impression that you were not interested in prime factorization but what would be the result if the problem of "cancelling out" had been attacked by hand.

      I believe the approach I presented could be made more efficient though I doubt it would be able to compete with a good prime factorization algorithm. I am familiar with several, some may have even been used in this thread, such as the quadratic sieve - but I don't currently have the time to write a perl implementation. IIRC numbers smaller than a certain size are best done with 1 algorithm and larger than a different number with a different algorithm with the numbers falling in the middle being a toss up.

      If I do come up with anything I will be sure to let you know. Thanks for the fun problem.

      Cheers - L~R

      BrowserUk,

      I would be curious to see the results if you ever benchmark the GCD approach with the prime factor approach on the problem size you were mentioning

      The GCD Approach is O(m*n)*O(GCDalgo) but I am not sure how the complexity reduces for the prime factorization approach.

      You need to calculate the prime factor each of the product value right? i.e. if you have to do 100*101*102*103*104/57*58*59 then you need factor for each of the values correct?

      The GCD approach will loop (5 * 3)*O(GCDalgo) times. But prime factor approach needs to calc factors for 8 values and that could potentially loop a lot more per value (max upto the value) and the complexity will be more than the GCD approach.

      I ran the GCD approach and took about 9 minutes (not a formal benchmark) just to cancel out terms. Don't have any math package to do big multiplications/divisions.

      I even tried to reduce the inner loop by switching @a and @b but did not see a huge impact

      Also, i am confused whether the Haskell code actually computed the example (40000+!) because the execution speed was just unbelievable. Did it loose any precision when you tested it? I guess the approach was similar in canceling out terms. It makes me wonder why Perl canceling out takes so much longer than Haskell implementation

      will be happy to hear on how your final implementation look.

      Thanks,

      SK

        These are the results I have currently for the inputs [989 9400 43300 2400]:

        tmoertel's haskell implementation of the cancelling code ( in 2.6 seconds):

        [11:20:43.54] C:\ghc\ghc-6.4\code>FET < ex1.dat +8.070604647867604097576877675243109941729476950993498765160880e-7030 [11:20:46.34] C:\ghc\ghc-6.4\code>

        Using Math::Pari (in 26 ms):

        P:\test>MP-FET.pl 8.070604647867604097576877668E-7030 1 trial of _default ( 24.817ms total), 24.817ms/trial

        Using Math::BigInt (Yes. That 4 hrs 38 minutes!)

        P:\test>MBF-FET.pl 8.070604647867604097576877675243109941729e-7030 1 trial of _default ( 16,699s total), 16,699s/trial

        The whole point of the cancelling code is that you do not have to calculate the huge factorials in infinite precison (slow) math, because you reduce the factorials to products of sets of much smaller, prime factors and then cancel most of those.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
        Looking at my copy of The Art of Computer Programming: Seminumerical Methods, 3E, Knuth gives the worst case as:
        O(GCD) ~= 4.785*log(N) + 0.6723
        where N is the smaller argument to Euclid's algorithm. The average is much better:
        Avg(GCD) ~= 0.1775*N
        Update: hv pointed out that the approximation for the mean didn't look right. I went back to the source, and see that I pulled out an intermediate result (I have to read more closely in the future). So it's not the mean.

        The mean is given as

        Avg(GCD) ~= 1.9405*log10(N)
        where N is the smaller argument to Euclid's algorithm, and ignoring a subtracted correction term based on the divisors of N.

        -QM
        --
        Quantum Mechanics: The dreams stuff is made of

      Coming up with an efficient, pure perl implementation of prime factorisation (limiting myself to integers 0 .. 2**32 ) is my current fun challenge :)
      You might have a look at Math::Big::Factors, which I found useful, and is pure Perl. (Note that the docs have a typo, and mention factor_wheel where the function is actually named factors_wheel.)

      -QM
      --
      Quantum Mechanics: The dreams stuff is made of

        Without actually trying it, Math::Big::Factors says:

        For very small numbers the computation of the wheel of order 7 may actually take longer than the factorization, but anything that has more than 10 digits will usually benefit from order 7.

        As I'm limiting myself to 2**32, it probably won't help?


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.