Algorithm for cancelling common factors between two lists of multiplicands

BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

Given 2 arrays of numbers, I want an efficient algorithm for "cancelling out". (There is probably a proper name for this, but I don't know it).

Most easily described by example. Given:

@a = ( 10, 20, 33, 45, 60 );

@b = ( 2, 5, 10, 12, 16, 23, 45 );

I want to cancel values between the arrays to end up with:

@c = ( 5, 33 ); Please note that 5 does not appear in @a.

@d = ( 8, 23 ); And 8 does not appear in @b.

Note. This is a trivial example. Both the sizes of the arrays and the values will be much larger.

Update: This is not the elimination of duplicates or intersection of arrays. Once any duplicates have been eliminated, it then becomes a task of reducing factors from @a & @b unil no further factors can be removed.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?

"Science is about questioning the status quo. Questioning authority".

The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.

2005-08-16 Retitled by Arunbear, as per Monastery guidelines
Original title: 'Algorithm wanted'

Comment on Algorithm for cancelling common factors between two lists of multiplicands Select or Download Code

Replies are listed 'Best First'.
Re: Algorithm for cancelling common factors between two lists of multiplicands by hv (Prior) on Aug 08, 2005 at 19:58 UTC
Unless the individual values get very large, I think the easiest would be to factor them: `sub combine_factors { my $list = shift; my $factors = {}; for (@$list) { for (factors($_)) { ++$factors->{$_}; } } $factors; } my $fa = combine_factors(\@a); my $fb = combine_factors(\@b); ($fa->{$_} \|\|= 0) -= $fb->{$_} for keys %$fb; my @c = map +($_) x $fa->{$_}, grep $fa->{$_} > 0, keys %$fa; my @d = map +($_) x -$fa->{$_}, grep $fa->{$_} < 0, keys %$fa;` [download] This has the disadvantage that it will leave a list of primes: `@c == (3, 5, 11), @d == (2, 2, 2, 23)`. I'm not sure how you'd retain the original composites (or parts thereof), nor how you'd define which composites to keep, so I chose to ignore that aspect of it. Hugo	[reply] [d/l] [select]
Re^2: Algorithm for cancelling common factors between two lists of multiplicands by BrowserUk (Patriarch) on Aug 08, 2005 at 20:20 UTC
Thankyou hv. That'll work. I don't suppose you know of a good factoring algorithm? I'd like to avoid Math::Pari (if possible) for this. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.	[reply]
Re^3: Algorithm for cancelling common factors between two lists of multiplicands by hv (Prior) on Aug 09, 2005 at 00:07 UTC
For simple problems I usually use something like this: `{ my @p; BEGIN { @p = (2, 3) } sub nextprime { my $p = $p[$#p]; DIV: while (1) { $p += 2; my $pc = 0; while (1) { my $d = $p[$pc++]; last if $d * $d > $p; next DIV unless $p % $d; } $p[@p] = $p; return $p; } } sub factors { my $n = shift; my($pc, @result) = (0); while ($n > 1) { my $d = $p[$pc++] \|\| nextprime(); if ($d * $d > $n) { push @result, $n; $n = 1; } else { while ($n % $d == 0) { $n /= $d; push @result, $d; } } } \@result; } }` [download] The rest of the time I use Math::Big*, sometimes with Pari or (lately) GMP. Hugo	[reply] [d/l]
Re^4: Algorithm for cancelling common factors between two lists of multiplicands by BrowserUk (Patriarch) on Aug 09, 2005 at 00:27 UTC
Re^3: Algorithm for cancelling common factors between two lists of multiplicands by hv (Prior) on Jun 10, 2022 at 17:19 UTC
I don't suppose you know of a good factoring algorithm? From the future: these days, Math::Prime::Util (along with Math::Prime::Util::GMP for larger inputs) is excellent.	[reply]
Re: Algorithm for cancelling common factors between two lists of multiplicands by BrowserUk (Patriarch) on Aug 08, 2005 at 20:14 UTC
The same exampled worked through: `@a = ( 10, 20, 33, 45, 60 );` `@b = ( 2, 5, 10, 12, 16, 23, 45 );` Eliminate common elements: `@a = ( 20, 33, 60 );` `@b = ( 2, 5, 12, 16, 23 );` Cancel 2 with 20 to give 1 and 10 (discard the 1): `@a = ( <strike>20</strike><ins>10</ins>, 33, 60 );` `@b = ( <strike>2</strike><ins>1</ins>, 5, 12, 16, 23 );` Cancel 5 with 10 to give 1 and 2: `@a = ( <strike>10</strike><ins>2</ins>, 33, 60 );` `@b = ( <strike>5</strike><ins>1</ins>, 12, 16, 23 );` Cancel 2 with 12 to give 1 and 6 (discard the 1 ) `@a = ( 2, 33, 60 );` `@b = ( 12, 16, 23 );` Cancel 6 with 60 to give 1 and 10 (discard the 1 ) `@a = ( 33, 60 );` `@b = ( 6, 16, 23 );` Cancel 16 with 10 to give 8 and 5 `@a = ( 33, 10 );` `@b = ( 16, 23 );` Result: (Other reductions are possible!) `@a = ( 33, 5 );` `@b = ( 8, 23 );` Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.	[reply] [d/l] [select]
Re^2: Algorithm for cancelling common factors between two lists of multiplicands by ikegami (Patriarch) on Aug 08, 2005 at 20:42 UTC
If the solution in my earlier node doesn't return what you want, this surely will: use Math::BigInt (); gcd = \&Math::BigInt::bgcd; sub embiggen { local $_ = @_ ? $_[0] : $_; return Math::BigInt->new($_); } my @a = ( 10, 20, 33, 45, 60 ); my @b = ( 2, 5, 10, 12, 16, 23, 45 ); # my %a = map { $_ => 1 } @a; # my %b = map { $_ => 1 } @b; # # my @c = map embiggen, # grep { !$b{$_} } # @a; # # my @d = map embiggen, # map { Math::BigInt->new($_) } # grep { !$a{$_} } # @b; my @c = @a; my @d = @b; foreach my $c (@c) { foreach my $d (@d) { my $gcd = gcd($c, $d); $c /= $gcd; $d /= $gcd; } } @c = grep { $_ != 1 } @c; @d = grep { $_ != 1 } @d; [download] or this messier but potentially faster alternative: use Math::BigInt (); use Math::Big::Factors (); gcd = \&Math::BigInt::bgcd; *factor = \&Math::Big::Factors::factors_wheel; sub embiggen { local $_ = @_ ? $_[0] : $_; return Math::BigInt->new($_); } my @a = ( 10, 20, 33, 45, 60 ); my @b = ( 2, 5, 10, 12, 16, 23, 45 ); # my %a = map { $_ => 1 } @a; # my %b = map { $_ => 1 } @b; # # my @c = map embiggen, # grep { !$b{$_} } # @a; # # my @d = map embiggen, # grep { !$a{$_} } # @b; my @c = @a; my @d = @b; my %c_factors; foreach my $c_idx (0..$#c) { my @c_factors = factor($c[$c_idx]); foreach my $c_factor (@c_factors) { push(@{$c_factors{$c_factor}}, $c_idx); } } foreach my $d (@d) { my @d_factors = factor($d); foreach my $d_factor (@d_factors) { next unless $c_factors{$d}; next unless @{$c_factors{$d}}; my $c_idx = shift(@{$c_factors{$d}}); $c[$c_idx] /= $d_factor; $d /= $d_factor; } } @c = grep { $_ != 1 } @c; @d = grep { $_ != 1 } @d; [download] Both untested.	[reply] [d/l] [select]
Re^3: Algorithm for cancelling common factors between two lists of multiplicands by sk (Curate) on Aug 08, 2005 at 22:43 UTC
Update I wrote this when we were trying to figure out the exact output. Not relevant anymore ~~ikegami,~~ Your first should work correctly. Did not read the second one yet. One thing you want to note when you test this code is that the result will not match exactly what BrowserUk had coz the cancellation process is kind of subjective when done by humans as opposed to computers ~~take for example if i have `15 * 3 * 12 / 9 * 17`~~ ~~then I could either do 5 * 12/17 or 15 * 4/17~~ ~~I guess the final solution should match product of the elements in the array~~ ~~Another thing, you might get a tiny little benefit if you check for the elements == 1 inside the loop construct and pop them out. I am not sure if that is even possible.~~ BTW, i think you can add `next if $c == 1 or $d == 1;` to avoid computing gcd's on 1's. -SK	[reply] [d/l] [select]
Re^2: Algorithm for cancelling common factors between two lists of multiplicands by sk (Curate) on Aug 08, 2005 at 20:39 UTC
Here is my shot at a pseudo-code 1. Pick the first element of array a 2. Calcuate the GCD(a[0],b[0]) or better yet ~~a[0] = mod (a[0], gcd(a[0],b[0]) and b[0] =mod (a[0], gcd(b[0],b[0])~~ (just divide elements by gcd, it should be an integer) 3. If anything is 1 pop it out of the list 4. Go through the list of elements for b with a[0] again. Do this until all elements are scanned in b. 5. Take the second element in a and repeat the process. Does this work? I shall post the code if i get a chance to implement it -SK PS: This reminds of division in high school days ;)	[reply]
Re^3: Algorithm for cancelling common factors between two lists of multiplicands by BrowserUk (Patriarch) on Aug 08, 2005 at 20:50 UTC
PS: This reminds of division in high school days ;) I guess that's where the "cancelling out" phrase comes from :) Maybe the confusion comes from the fact that most people younger than me never learned to cancel out their equations because they all used calculators from an early age. They probably don't know what "casting out the nines" is either. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.	[reply] [d/l]
Re^2: Algorithm for cancelling common factors between two lists of multiplicands by gri6507 (Deacon) on Aug 08, 2005 at 20:39 UTC
Why not do this: foreach, element in an array, factor to primes (i.e. if starting from 10, then 125). Create a hash for for the first array where keys are factors of each element and values are the number of times each factor has occurred. Repeat for second array. "Subtract" the two hashes. Multiply the resulting keys by their respecting values --> should give you the elements of the resulting array.	[reply]
Re: Algorithm for cancelling common factors between two lists of multiplicands by GrandFather (Saint) on Aug 08, 2005 at 19:41 UTC
Will the arrays have equal numbers of values? Will there be repeated values in teh individual arrays (1, 1, 10, 20, ...)? Could the arrays fit in (virtual) memory or are they disk based and too big? Perl is Huffman encoded by design.	[reply]
Re^2: Algorithm for cancelling common factors between two lists of multiplicands by BrowserUk (Patriarch) on Aug 08, 2005 at 19:48 UTC
Will the arrays have equal numbers of values? Unimportant. Will there be repeated values in teh individual arrays (1, 1, 10, 20, ...)? Possibly. Could the arrays fit in (virtual) memory or are they disk based and too big? They will fit in memory. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.	[reply]
Re: Algorithm for cancelling common factors between two lists of multiplicands by sgifford (Prior) on Aug 08, 2005 at 20:00 UTC
Hi BrowserUK, I'm not understanding what you mean by cancelling out. Can you explain a bit more thoroughly? I see you've updated the original node, but it's still not clear to me. If an item in `b` is a factor of an item in `a`, should both be eliminated? Or if they have a factor in common? Or...? Also, will the arrays be sorted, as they are in your example?	[reply] [d/l] [select]
Re: Algorithm for cancelling common factors between two lists of multiplicands by techra (Pilgrim) on Aug 08, 2005 at 20:32 UTC
I'm curious when I see a post like this as to why? Not only why do you have this data that you want to crunch in this way but also, what do you get out of these crunched numbers? I understand that often (myself included) we're restricted by NDAs to actually talk about the application of our work, but something like this just makes me wonder what's going on and why do I never come across any problems that this becomes a question?	[reply]
Re^2: Algorithm for cancelling common factors between two lists of multiplicands by BrowserUk (Patriarch) on Aug 08, 2005 at 20:45 UTC
Because getting accurate answers to this equation takes a very long time unless you use arithmetic methods to reduce the problem. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.	[reply]
Re^3: Algorithm for cancelling common factors between two lists of multiplicands by tmoertel (Chaplain) on Aug 08, 2005 at 22:24 UTC
Can you provide a sample data set (i.e., matrix) that you consider to be "large"? Also, could you give me an idea of how long it takes to compute P_cutoff without using arithmetic optimizations? (I would like to try out a quick Haskell-based implementation I whipped up on a real data set.) Tom Moertel : Blog / Talks / CPAN / LectroTest / PXSL / Coffee / Movie Rating Decoder	[reply]
Re^4: Algorithm for cancelling common factors between two lists of multiplicands by BrowserUk (Patriarch) on Aug 08, 2005 at 22:58 UTC
Re^5: Algorithm for cancelling common factors between two lists of multiplicands by tmoertel (Chaplain) on Aug 08, 2005 at 23:16 UTC
Some notes below your chosen depth have not been shown here
Re^5: Algorithm for cancelling common factors between two lists of multiplicands by tmoertel (Chaplain) on Aug 08, 2005 at 23:31 UTC
Some notes below your chosen depth have not been shown here
Re^5: Algorithm for cancelling common factors between two lists of multiplicands by sk (Curate) on Aug 08, 2005 at 23:05 UTC
Some notes below your chosen depth have not been shown here
Re^5: Algorithm for cancelling common factors between two lists of multiplicands by Limbic~Region (Chancellor) on Aug 10, 2005 at 15:30 UTC
Re^3: Algorithm for cancelling common factors between two lists of multiplicands by jimX11 (Friar) on Aug 09, 2005 at 01:07 UTC
The original problem, framed in terms of arrays, is to reduce a fraction. The equation above is less general. The less general problem can be restated as reduce a fraction where the numerator and denominator are both the product of factorials. An example of the original problem is to reduce (9 * 8 * 8 * 8 * 6 * 4 )/(5 * 3 * 2) An example of the factorial problem is to reduce (9! * 8! * 8! * 8! * 6! * 4! )/(5! * 3! * 2!) Do I understand correctly, are you are interested in the factorial type of problem?	[reply]
Re^4: Algorithm for cancelling common factors between two lists of multiplicands by BrowserUk (Patriarch) on Aug 09, 2005 at 01:35 UTC
Re: Algorithm for cancelling common factors between two lists of multiplicands by QM (Parson) on Aug 09, 2005 at 03:24 UTC
There are better ways to compute factorial quotients without overflow or loss of precision. But I won't go into that here. Instead I'll address the problem you stated originally. After examining all of the replies to date, it seems you want to reduce the quotient. Since you're dealing with large numbers (products of factorials), reducing it won't be so easy. Here's a school-boy method that just relies on breaking the process down into steps, nothing fancy: First in pseudo-code: 1) Factor each term in the numerator and denominator into its prime factors. (Since these are factorials, this is pretty quick). 2) Collect all of the numerator factors into one array, and all of the denominator factors in another array. 3) Sort both arrays. 4) Do a modified merge sort to cancel like terms in both arrays. Keep any terms that don't have a match in the other array. 5) You can compute the quotient to a high degree of accuracy without overflowing or underflowing by judicious choices in the next term to grab. Now let's see if I can do this in code (untested): my @num = ( 3, 7, 11 ); my @den = ( 5, 12, 19 ); # get prime factors (you have to write this yourself :) # it returns the list of prime factors for each argument my @num_fac = get_prime_factors_list( 2..$num[-1] ); my @den_fac = get_prime_factors_list( 2..$den[-1] ); @num_fac = sort { $a <=> $b } @num_fac; @den_fac = sort { $a <=> $b } @den_fac; # modified merge sort my $n, $d; while (( $n < $#num_fac ) and ( $d < $#den_fac )) { if ( $num_fac[$n] == $den_fac[$d] ) { # drop both $n++; $d++; } elsif ( $num_fac[$n] < $den_fac[$d] ) { push @num_fin, $num_fac[$n]; $n++; } else { push @den_fin, $den_fac[$d]; $d++; } } # any remaining factors are not shared push @num_fin, @num_fac[$n..$#num_fac]; push @den_fin, @den_fac[$d..$#den_fac]; # edge cases @num_fin = (1) unless @num_fin; @den_fin = (1) unless @den_fin; # carefully multiply and divide my $q = shift @num_fin; while ( @num_fin and @den_fin ) { if ( $q > 1 ) { $q = $q / (shift @den_fin); } if ( $q < 1 ) { $q = $q * (shift @num_fin); } } # only one of these will be true: $q = $q * (shift @num_fin) while @num_fin; $q = $q / (shift @den_fin) while @den_fin; [download] You can probably improve this considerably. For instance, instead of keeping each prime factor (including duplicates), only keep a count. -QM -- Quantum Mechanics: The dreams stuff is made of	[reply] [d/l]
Re^2: Algorithm for cancelling common factors between two lists of multiplicands by BrowserUk (Patriarch) on Aug 09, 2005 at 09:06 UTC
There are better ways to compute factorial quotients without overflow or loss of precision. So, now I've got my "schoolboy approach" working, care to point me at the methods clever people would use? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.	[reply]
Re^3: Algorithm for cancelling common factors between two lists of multiplicands by QM (Parson) on Aug 09, 2005 at 15:19 UTC
So, now I've got my "schoolboy approach" working, care to point me at the methods clever people would use? I was too lazy to look before, but now that I have, I was a bit off. On the other hand, I snuck in something similar in step 5 anyway. Here's what I remembered when I said that: In QOTW #2 MJD considers the n_choose_k function that computes: `sub n_choose_k { my ($n, $k) = @_; # f($n) = $n! return f($n)/f($k)/f($n-$k); }` [download] He notes that the intermediate values are very large, even though the final result is not. He submits this replacement: `sub n_choose_k { my ($n, $k) = @_; my $t = 1; for my $i (1 .. $k) { $t = $n - $k + $i; $t /= $i; } return $t; }` [download] and he notes: $t here is always an integer, and never gets bigger than necessary.* For your formula, this only applies to a small portion of the terms. You may be better off just ignoring MJD's improvement and proceeding as I outlined before. -QM -- Quantum Mechanics: The dreams stuff is made of	[reply] [d/l] [select]
Re: Algorithm for cancelling common factors between two lists of multiplicands by ikegami (Patriarch) on Aug 08, 2005 at 19:46 UTC
If the arrays don't contain duplicate elements: `$c{$_}++ foreach @a; $c{$_}++ foreach @b; @c = grep { $c{$_} != 2 } @a; @d = grep { $c{$_} != 2 } @b;` [download]	[reply] [d/l]
Re^2: Algorithm for cancelling common factors between two lists of multiplicands by ikegami (Patriarch) on Aug 08, 2005 at 19:49 UTC
This works whether the arrays contain duplicate elements or not: `my %a = map { $_ => 1 } @a; my %b = map { $_ => 1 } @b; @c = grep { !$b{$_} } @a; @d = grep { !$a{$_} } @b;` [download]	[reply] [d/l]
Re^2: Algorithm for cancelling common factors between two lists of multiplicands by BrowserUk (Patriarch) on Aug 08, 2005 at 19:50 UTC
Please look again. This is not the elimination of duplicates. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.	[reply] [d/l]
Re^3: Algorithm for cancelling common factors between two lists of multiplicands by ikegami (Patriarch) on Aug 08, 2005 at 19:55 UTC
oops, you're right. I can't figure out what the pattern is. What does he mean by "canceling out"? Update: Here's a solution: `use Math::BigInt (); use Math::Big::Factors (); my $a = Math::BigInt->new(1); $a = $_ foreach @a; # 17820000 my $b = Math::BigInt->new(1); $b = $_ foreach @b; # 19872000 my $gcd = Math::BigInt::bgcd($a, $b); # 108000 my $c = $a / $gcd; # 165 my $d = $b / $gcd; # 184 my @c = Math::Big::Factors::factors_wheel($c); # 3, 5, 11 my @d = Math::Big::Factors::factors_wheel($d); # 2, 2, 2, 23` [download] Not quite the solution you asked for. Close enough?	[reply] [d/l]
Re^3: Algorithm for cancelling common factors between two lists of multiplicands by xdg (Monsignor) on Aug 08, 2005 at 20:05 UTC
Is this a factoring problem? Math::Factor and Math::Big::Factors may have useful routines. In particular Math::Factor has the subroutine "match" that sounds something like what you're trying to do, though as others have stated, it's still not clear what exactly that is. -xdg Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.	[reply]
Re: Algorithm for cancelling common factors between two lists of multiplicands by salva (Canon) on Aug 08, 2005 at 19:55 UTC
`$i{$_}++ for (@a); $i{$_}-- for (@b); @c=grep { $i{$_} > 0 } keys @a; @d=grep { $i{$_} < 0 } keys @b;` [download] btw, are your sample solution arrays right?	[reply] [d/l]
Re^2: Algorithm for cancelling common factors between two lists of multiplicands by BrowserUk (Patriarch) on Aug 08, 2005 at 19:57 UTC
This is not intersection of arrays! btw, are your sample solution arrays right? Yes! Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.	[reply]
Re: Algorithm for cancelling common factors between two lists of multiplicands by tlm (Prior) on Aug 09, 2005 at 01:17 UTC
Memoize the logs of the factorials. BTW, I posted some FET code earlier that does just that; if there's something wrong with it, please let me know. the lowliest monk	[reply]
Re^2: Algorithm for cancelling common factors between two lists of multiplicands by BrowserUk (Patriarch) on Aug 09, 2005 at 01:25 UTC
Try it. You'll find that doing combinatorial math with logs of factorials rapidly introduces large errors. For an example see Re^4: Fastest way to calculate hypergeometric distribution probabilities (i.e. BIG factorials)?. That's just one. Once you start adding them together the inaccuracies multiply. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.	[reply]
Re^2: Algorithm for cancelling common factors between two lists of multiplicands by BrowserUk (Patriarch) on Aug 10, 2005 at 00:41 UTC
BTW, I posted some FET code earlier that does just that; if there's something wrong with it, please let me know. Trying your code with these inputs `print Fishers_Exact::fishers_exact( 989, 9400, 43300, 2400 );` [download] I get these results: P:\test>FET-tlm.pl 0.99999999999999999999999999999999999999999999999[... 950 9s truncated +] 9999999999999999999999999999999999999999999999999[... 950 9s truncated +] 9999999999999999999999999999999999999999999999999[... 950 9s truncated +] 9999999999999999999999999999999999999999999999999[... 950 9s truncated +] 9999999999999999999999999999999999999999999999999[... 950 9s truncated +] 9999999999999999999999999999999999999999999999999[... 950 9s truncated +] 9999999999999999999999999999999999999999999999999[... 950 9s truncated +] 99999999999999999999999999999999952665545273822897512768877675 1 trial of _default ( 1.385s total), 1.385s/trial [download] Now, given that your code needs Math::Pari, and the "simple" solution using that module produced the correct answer in 26ms, I can't help wonder what all that messing around with Stirling and Gosper approximations bought you? I also think that if you are going to require exporter, you might as well let it do it's job and skip all that autoload stuff. And I personally find that littering your code with a dozen BEGIN{} blocks makes it very hard to follow. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.	[reply] [d/l] [select]
Re: Algorithm for cancelling common factors between two lists of multiplicands by tlm (Prior) on Aug 09, 2005 at 13:53 UTC
Here's a another solution. It takes two array refs, makes copies of them, and cancels out common terms, removing all 1's. sub list_reduce { my ( $top, $bot ) = map [ grep $_ != 1, @$_ ], @_; die "Bad data\n" if ( grep $_ < 1, @$top, @$bot ); OUTER: for my $i ( reverse 0 .. $#$top ) { for my $j ( reverse 0 .. $#$bot ) { ( $top->[ $i ], $bot->[ $j ] ) = reduce( $top->[ $i ], $bot->[ $ +j ] ); splice @$bot, $j, 1 if $bot->[ $j ] == 1; if ( $top->[ $i ] == 1 ) { splice @$top, $i, 1; next OUTER; } } } return ( $top, $bot ); } sub reduce { my ( $top, $bottom ) = @_; my $gcd = Math::BigInt::bgcd( $top, $bottom ); return ( $top/$gcd, $bottom/$gcd ); } [download] the lowliest monk	[reply] [d/l] [select]
Re: Algorithm for cancelling common factors between two lists of multiplicands by Limbic~Region (Chancellor) on Aug 09, 2005 at 16:26 UTC
BrowserUk, The order and pairings you choose to cancel out affect the result. This may prevent optimizations that seek to cancel out terms in a different order (such as largest terms first). For instance in your example, it took 6 cancelings to get the resut. Here is another possibility that doesn't result in prime factorization but only takes 3 pairings: Read more... (477 Bytes) My algorithm doesn't require prime factorization (just GCD). Read more... (2 kB) It doesn't produce the output you wanted so I didn't bother trying to optimize it WRT choosing pairings. With the lack of optimizations it is likely not that efficient - but it was fun. Cheers - L~R	[reply] [d/l] [select]
Re^2: Algorithm for cancelling common factors between two lists of multiplicands by BrowserUk (Patriarch) on Aug 10, 2005 at 01:58 UTC
It doesn't produce the output you wanted ... As I mentioned in 482012, the exact reduction isn't important. The results of your 3 manual steps `(33,5)(2,4,23)` and my 5 manual steps `(33,5)(8,23)` are completely equivalent. Adding a dependancy upon Inline::C is heavier than Math::Pari, though GCD could be written in perl of course. Your algorithm certainly works. However, using GCD requires multiple, O(mn) passes over the datasets. hv's prime factors approach eliminates the multiple passes and seems, on the small sets I've tried, to get the results more quickly, even with pure perl factoring code. Coming up with an efficient, pure perl implementation of prime factorisation (limiting myself to integers 0 .. 2*32 ) is my current fun challenge :) Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.	[reply] [d/l] [select]
Re^3: Algorithm for cancelling common factors between two lists of multiplicands by Limbic~Region (Chancellor) on Aug 10, 2005 at 12:30 UTC
BrowserUk, I misinterpreted "Other reductions possible" to read "Further reductions possible" given the example and where the commentary appeared. Admittedly I haven't read the entire thread and was under the incorrect impression for any given input there was only 1 acceptable output. I was also under the incorrect impression that you were not interested in prime factorization but what would be the result if the problem of "cancelling out" had been attacked by hand. I believe the approach I presented could be made more efficient though I doubt it would be able to compete with a good prime factorization algorithm. I am familiar with several, some may have even been used in this thread, such as the quadratic sieve - but I don't currently have the time to write a perl implementation. IIRC numbers smaller than a certain size are best done with 1 algorithm and larger than a different number with a different algorithm with the numbers falling in the middle being a toss up. If I do come up with anything I will be sure to let you know. Thanks for the fun problem. Cheers - L~R	[reply]
Re^3: Algorithm for cancelling common factors between two lists of multiplicands by sk (Curate) on Aug 10, 2005 at 07:56 UTC
BrowserUk, I would be curious to see the results if you ever benchmark the GCD approach with the prime factor approach on the problem size you were mentioning The GCD Approach is O(mn)O(GCDalgo) but I am not sure how the complexity reduces for the prime factorization approach. You need to calculate the prime factor each of the product value right? i.e. if you have to do `100101102103104/575859` then you need factor for each of the values correct? The GCD approach will loop (5 * 3)*O(GCDalgo) times. But prime factor approach needs to calc factors for 8 values and that could potentially loop a lot more per value (max upto the value) and the complexity will be more than the GCD approach. I ran the GCD approach and took about 9 minutes (not a formal benchmark) just to cancel out terms. Don't have any math package to do big multiplications/divisions. I even tried to reduce the inner loop by switching @a and @b but did not see a huge impact Also, i am confused whether the Haskell code actually computed the example (40000+!) because the execution speed was just unbelievable. Did it loose any precision when you tested it? I guess the approach was similar in canceling out terms. It makes me wonder why Perl canceling out takes so much longer than Haskell implementation will be happy to hear on how your final implementation look. Thanks, SK	[reply] [d/l]
Re^4: Algorithm for cancelling common factors between two lists of multiplicands by BrowserUk (Patriarch) on Aug 10, 2005 at 10:35 UTC
Re^5: Algorithm for cancelling common factors between two lists of multiplicands by sk (Curate) on Aug 10, 2005 at 16:38 UTC
Some notes below your chosen depth have not been shown here
Re^5: Algorithm for cancelling common factors between two lists of multiplicands by QM (Parson) on Aug 10, 2005 at 18:39 UTC
Some notes below your chosen depth have not been shown here
Re^5: Algorithm for cancelling common factors between two lists of multiplicands by tmoertel (Chaplain) on Aug 11, 2005 at 05:04 UTC
Some notes below your chosen depth have not been shown here
Re^4: Algorithm for cancelling common factors between two lists of multiplicands by QM (Parson) on Aug 10, 2005 at 18:36 UTC
Re^3: Algorithm for cancelling common factors between two lists of multiplicands by QM (Parson) on Aug 10, 2005 at 17:51 UTC
Coming up with an efficient, pure perl implementation of prime factorisation (limiting myself to integers 0 .. 232 ) is my current fun challenge :) You might have a look at Math::Big::Factors, which I found useful, and is pure Perl. (Note that the docs have a typo, and mention `factor_wheel` where the function is actually named `factors_wheel`.) -QM -- Quantum Mechanics: The dreams stuff is made of	[reply] [d/l] [select]
Re^4: Algorithm for cancelling common factors between two lists of multiplicands by BrowserUk (Patriarch) on Aug 10, 2005 at 18:48 UTC
Re^5: Algorithm for cancelling common factors between two lists of multiplicands by QM (Parson) on Aug 10, 2005 at 22:46 UTC
Some notes below your chosen depth have not been shown here
Re: Algorithm for cancelling common factors between two lists of multiplicands by QM (Parson) on Aug 11, 2005 at 23:33 UTC
I have toyed with this more, and have a pure Perl solution that runs your example in about 7 minutes on my PC, and all but 5 seconds is the factoring by Math::Big::Factors. I have an idea about prefactoring up to some limit, which I'll report on later. You might note that 232 is not a practical limit, unless most of the terms cancel out. For instance, if you had to compute (232)!/(231)!, that would be 231 terms in the numerator. Can you comment on the practical limits of your application? For your example, which is essentially: `10389! 4570! 44289! 11800! ------------------------------ 56089! 989! 9400! 43300! 2400!` [download] and comparing results: `8.070604647867604097576877675243109941805e-7030 8.070604647867604097576877675243109941729476950993498765160880e-7030` [download] (Note that MBF actually moved the decimal and gave the exponent as "-7069" -- I've normalized it to the other result.) Here's my code: Read more... (3 kB) Update: I ran some variations, here's the time comparisons: Order = 5, Accuracy = 40, time = 7 min Order = 6, Accuracy = 60, time = 35 min (!!) no factoring, Accuracy = 40, time = 28 sec no factoring, Accuracy = 60, time = 30 sec no factoring, Accuracy = 40, eval, time = 14.0 sec no factoring, Accuracy = 60, eval, time = 14.7 sec The "eval" version changes this code: `foreach my $t ( keys %{$factors} ) { my $bft = $ONE->copy()->bmul($t)->bpow($factors->{$t}); $q->bmul($bft); }` [download] to this: `my $s = '$q'; foreach my $t ( keys %{$factors} ) { next if ( $factors->{$t} == 0 ); if ( $factors->{$t} > 0 ) { $s .= "->bmul($t)" x $factors->{$t}; } else { $s .= "->bdiv($t)" x abs($factors->{$t}); } } eval $s; die "Error on eval: $@" if $@;` [download] I tried it with `->bpow()`, but that's 2X slower. (I'd offer that `->bpow()` should only be used for non-integer powers.) Seems `eval` has its uses! Update 2: I made a typo in above on bdiv -- should have used abs. It's been changed, and the timings updated (roughly doubled). -QM -- Quantum Mechanics: The dreams stuff is made of	[reply] [d/l] [select]
Re^2: Algorithm for cancelling common factors between two lists of multiplicands by BrowserUk (Patriarch) on Aug 12, 2005 at 02:59 UTC
My best performing pure perl attempt so far runs that testcase in just over 8 seconds: `[ 3:24:33.57] P:\test>fet3 989 9400 43300 2400 [989 9400 43300 2400] 0.80706046478676263e-7029 1 trial of _default ( 8.164s total), 8.164s/trial` [download] The precision is limited to that of a standard double, but I played some games with scaling during the final calculation in order to achieve a range beyond that of a 1e±308 of a double. Theortically, it will now handle some pretty big numbers, but the limiting factor is memory as I am generating some pretty big lists. I should be able to alleviate that substantially by implementing Limbic~Region's idea of cancelling the factorials against each other before generating the product lists rather than building all the lists and then cancelling them as I am now. Ie. `10389! 4570! 44289! 11800! ------------------------------ 56089! 989! 9400! 43300! 2400! becomes 44289! 11800! 10389! 4570! ------------------------------ 56089! 43300! 9400! 2400! 989! (\|\|10389..9401) (\|\|4570..2401) --------------------------------------------- (\|\|56089..44290) (\|\|43300..11801) 989! where (\|\|m..n) indicates the product of the series` [download] In this example that reduces the number of factorisations from 183,406 to 47,444, which ought to make a big difference both on speed and memory consumed, and these saving would be even greater for larger observations in all but the most pathelogical of cases. I've had some trouble getting that to work properly though. Encoding the heuristics that seem obvious by inspection of one or two cases, into a generalised algorithm for all inputs, is eluding me for the moment. You might note that 232 is not a practical limit, unless most of the terms cancel out. I agree that 2**32 is very much an upper bound. The main reason for stating it is that it pretty much precludes the benefit of many of the more sophisticated factorisation methods I've read about. It also places a practical upper bound on the number of primes I need. The code (large because of the embedded primes list) Read more... (98 kB) Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.	[reply] [d/l] [select]
Re^3: Algorithm for cancelling common factors between two lists of multiplicands by QM (Parson) on Aug 12, 2005 at 18:45 UTC
Don't factor! I removed `map{ primeFactors $_ }` from your FET3 sub, and it runs 5X faster! (The last 4 digits differ between the 2 versions.) -QM -- Quantum Mechanics: The dreams stuff is made of	[reply] [d/l]
Re^4: Algorithm for cancelling common factors between two lists of multiplicands by tmoertel (Chaplain) on Aug 12, 2005 at 20:23 UTC
Re^4: Algorithm for cancelling common factors between two lists of multiplicands by BrowserUk (Patriarch) on Aug 12, 2005 at 22:32 UTC
Re^5: Algorithm for cancelling common factors between two lists of multiplicands by QM (Parson) on Aug 12, 2005 at 23:03 UTC
Some notes below your chosen depth have not been shown here
Re^4: Algorithm for cancelling common factors between two lists of multiplicands by BrowserUk (Patriarch) on Aug 12, 2005 at 19:07 UTC
Re^3: Algorithm for cancelling common factors between two lists of multiplicands by sk (Curate) on Aug 12, 2005 at 05:20 UTC
I am not sure if you had a chance to look at my node Re^7: Algorithm for cancelling common factors between two lists of multiplicands I guess my approach is same as Limbic~Region as I was using subtraction of lower factorial terms with higher ones. I copied the input list from one of your previous examples and I have `45700` instead of `4570` but aside from that the implementation should be easy to understand. I guess if you sort numerator and denominator and subtract them you should be all set. I have not proved this formally but here is a stab at a simple proof that shows sorting and subtracting should work... Let the fraction be X! Y! Z! ------- a! b! c! WLOG let's assume that X > Y > Z and a > b > c. Also let's assume that + b > Y , X > a (weaker assumption: Z > c)...<p> NOTE: if we divide X!/a! we will have (X-a) elements <p> To Prove: (X-a) + (Z-c) + (b-Y) is the shortest list one can find or in other words (X-a) + (Z-c) + (b-Y) <= (X-p) + (Z-q) + (r-Y) for any p,q,r in permut +ation(a,b,c) Proof: From the above equation since b > Y and Z > c, r should be equal to ei +ther a or b. If r = b then the solution is trivial<p> If r = a then we get (X-a) + (Z-c) + (b-Y) ?<= (X-b) + (Z-c) + (a-Y) canceling terms -a - c + b ?<= -b -c + a -a + b ?<= -b + a ====> YES since a > b we see that r = a is not the smallest list so r = b<p> Similarly we can also show that (X-a) + (Z-c) + (b-Y) <= (X-a) + (Y-c) + + (b-Z) [download] I don't think this is a rigourous proof this method but i sort of feel sorting and subtracting should give us what we need... cheers SK PS: I think there will be 47448 elements and not 47444 as you suggested? as you need to count the first element too..	[reply] [d/l] [select]
Re^4: Algorithm for cancelling common factors between two lists of multiplicands by BrowserUk (Patriarch) on Aug 12, 2005 at 12:19 UTC
Re^5: Algorithm for cancelling common factors between two lists of multiplicands by tmoertel (Chaplain) on Aug 12, 2005 at 16:40 UTC
Re^5: Algorithm for cancelling common factors between two lists of multiplicands by sk (Curate) on Aug 12, 2005 at 16:57 UTC
Some notes below your chosen depth have not been shown here
Re^2: Algorithm for cancelling common factors between two lists of multiplicands by hv (Prior) on Aug 13, 2005 at 00:20 UTC
Here's my take on it. Highlights: use Eratosthenes' sieve to determine a factor or primality for integers up to the max required; work out when an additional numerator or denominator kicks in (`@switch`), and pass over the numbers in reverse order, so each need be visited only once; build up two BigInts for numerator and denominator and just switch to BigFloats to divide the two at the end. The single division at the end takes over 2/3 of the total time, so I'd expect using GMP or Pari for the maths would cut it greatly: `zen% perl t1 807060464786760409757687767524310994172947695099349876516088e-7089 sieve: 0 wallclock secs ( 0.09 usr + 0.02 sys = 0.11 CPU) build: 4 wallclock secs ( 4.38 usr + 0.00 sys = 4.38 CPU) divide: 9 wallclock secs ( 8.90 usr + 0.00 sys = 8.90 CPU) total: 13 wallclock secs (13.37 usr + 0.02 sys = 13.39 CPU) zen%` [download] Things that made negligible difference: dropping accuracy to 40 (!); initialising `my $bii = Math::BigInt->new($i)` before the multiply loops; the `$r->bsstr` call. I noticed that the numerator consists of 855 prime factors comprising 755 distinct primes; the denominator has 2308 of which 2306 are distinct. No surprise, then, that bpow() isn't a win. Here's the code: Read more... (2 kB) Hugo	[reply] [d/l] [select]
Re: Algorithm for cancelling common factors between two lists of multiplicands by tlm (Prior) on Aug 10, 2005 at 02:32 UTC
From looking at the code you have posted I get the impression that you misunderstand the Fisher's Exact Test. What you are computing in your various implementations is only one term of the sum that makes up the FET statistic. The definition of the FET statistic I'm referring to here is entirely standard (though there is some variability as to whether the one-sided or two-sided version of this sum is used). The test says, if we assume the null hypothesis (independence between the two categorical variables), what is the probability of getting a sampling that is at least as far from the expected value (under the null hypothesis) as the actual data being tested; unless one of the cells is empty, this formulation entails computing a sum of terms. If you look at the code I posted, I don't compute a single ratio of factorials, but a whole sum of them. The value of this sum can be considerably larger than the value of a single term; in fact it can be 1, depending on the hypothesis being tested and the data at hand. Update: Here's the relevant passage from the MathWorld page whose formula you cited elsewhere (my emphasis): To compute the P-value of the test, the tables must then be ordered by some criterion that measures dependence, and those tables that represent equal or greater deviation from independence than the observed table are the ones whose probabilities are added together. the lowliest monk	[reply]
Re^2: Algorithm for cancelling common factors between two lists of multiplicands by BrowserUk (Patriarch) on Aug 10, 2005 at 10:13 UTC
I'm am completely aware that all the other terms need to be calculated in order to determine whether the value is significant. I even mentioned this in 482056. And, I guess I could be misusing your code, but when I feed it these values 5 0 4 1, it produces the same output exactly as my "misguided" code. Funnily enough, for those values, it produces the same output as tmoertel's (apparently equally misguided) code. Is it only when given the larger set of test values 989 9400 43300 2400 that your special code does something extraordinarily clever that I'm to stupid to understand? $T->start( '[5 0 1 4]' ); print Fishers_Exact::fishers_exact( 5, 0, 1, 4 ); $T->stop( '[5 0 1 4]' ); $T->report; $T->start( '[989 9400 43300 2400]' ); print Fishers_Exact::fishers_exact( 989, 9400, 43300, 2400 ); $T->stop( '[989 9400 43300 2400]' ); $T->report; __END__ P:\test>FET-tlm.pl 0.02380952380952380952380952380 1 trial of [5 0 1 4] ( 917us total), 917us/trial 0.9999999999999999999999999999999999999999999999999999999999999999999 999999999999999999999999999999999999999999999999999999999999999999999 999999999999999999999999999999999999999999999999999999999999999999999 999999999999999999999999999999999999999999999999999999999999999999999 999999999999999999999999999999999999999999999999999999999999999999999 999999999999999999999999999999999999999999999999999999999999999999999 999999999999999999999999999999999999999999999999999999999999999999999 99999999999999999999999999999999952665545273822897512768877675 1 trial of [5 0 1 4] ( 917us total), 917us/trial 1 trial of [989 9400 43300 2400] ( 1.353s total), 1.353s/trial [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.	[reply] [d/l]
Re^3: Algorithm for cancelling common factors between two lists of multiplicands by tlm (Prior) on Aug 10, 2005 at 14:05 UTC
Of course our results will match in those rare cases in which the FET sum happens to contain only one term. Silence betokens consent Not necessarily ;-) . the lowliest monk	[reply]
Re^4: Algorithm for cancelling common factors between two lists of multiplicands by BrowserUk (Patriarch) on Aug 10, 2005 at 14:34 UTC
Re^5: Algorithm for cancelling common factors between two lists of multiplicands by tlm (Prior) on Aug 11, 2005 at 13:24 UTC
Some notes below your chosen depth have not been shown here
Re: Algorithm for cancelling common factors between two lists of multiplicands by Anonymous Monk on Aug 09, 2005 at 12:55 UTC
Seems to me that this problem requires you to factor numbers. Factoring numbers is considered to be a really hard problem - it's what public key encryption makes secure. Noone knows a fast way to factor numbers (well, maybe someone does, but then, they have kept it a secret), so if you are going to have big numbers, the factoring will be the bottleneck.	[reply]
Re^2: Algorithm for cancelling common factors between two lists of multiplicands by BrowserUk (Patriarch) on Aug 09, 2005 at 13:41 UTC
There's big and big. Factoring values of a few thousand or even a thousand million isn't so hard. Cryptography uses numbers much, much bigger than these. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.	[reply]

Back to Seekers of Perl Wisdom