in reply to Re^5: Algorithm for cancelling common factors between two lists of multiplicands
in thread Algorithm for cancelling common factors between two lists of multiplicands
... is there a reason you aren't using the Chisquare test?
I'm not trying to solve the sample problem. That was just an example I found on the web and used as a test.
I set out to solve the problem of performing the FET using Perl. First I did it with Math::Pari, but which gives a result of 8.070604647867604097576877668E7030 in 26ms, but I was unsure about the accuracy. It also imposes a binary dependancy.
#! perl slw use strict; use Benchmark::Timer; use List::Util qw[ sum reduce ]; use Math::Pari qw[ factorial ]; $a=$b; sub product{ reduce{ $a *= $b } 1, @_ } sub FishersExactTest { my @data = @_; return unless @data == 4; my @C = ( sum( @data[ 0, 2 ] ), sum( @data[ 1, 3 ] ) ); my @R = ( sum( @data[ 0, 1 ] ), sum( @data[ 2, 3 ] ) ); my $N = sum @C; my $dividend = product map{ factorial $_ } grep $_, @R, @C; my $divisor = product map{ factorial $_ } grep $_, $N, @data; return $dividend / $divisor; } my $T = new Benchmark::Timer; $T>start( '' ); print FishersExactTest 989, 9400, 43300, 2400;; $T>stop( '' ); $T>report; __END__ P:\test>MPFET.pl 8.070604647867604097576877668E7030 1 trial of _default ( 25.852ms total), 25.852ms/trial
So, then I coded it using Math::BigFloat
#! perl slw use strict; use Benchmark::Timer; use List::Util qw[ reduce ]; use Math::BigFloat; $a=$b; sub product{ reduce{ $a *= $b } 1, @_ } sub sum{ reduce{ $a += $b } 0, @_ } sub FishersExactTest { my @data = map{ Math::BigFloat>new( $_ ) } @_; return unless @data == 4; my @C = ( sum( @data[ 0, 2 ] ), sum( @data[ 1, 3 ] ) ); my @R = ( sum( @data[ 0, 1 ] ), sum( @data[ 2, 3 ] ) ); my $N = sum @C; my $dividend = product map{ $_>bfac } grep $_, @R, @C; my $divisor = product map{ $_>bfac } grep $_, $N, @data; return $dividend / $divisor; } my $T = new Benchmark::Timer; $T>start( '' ); print FishersExactTest 989, 9400, 43300, 2400;; $T>stop( '' ); $T>report;
But that ran for 20 minutes without producing any output before I killed it (I've set it running again now, and my machines fan has been thrashing at full speed for the last 25 minutes).
Whilst I was waiting for the BigFloat version, I coded this version which attempts to reduce the size of the problem by eliminating (exactly common) factors:
sub FishersExactTest2 { my @data = @_; return unless @data == 4; my @C = ( sum( @data[ 0, 2 ] ), sum( @data[ 1, 3 ] ) ); my @R = ( sum( @data[ 0, 1 ] ), sum( @data[ 2, 3 ] ) ); my $N = sum @C; my %dividends; $dividends{ $_ }++ for map{ factors $_ } grep $_, @ +R, @C; my %divisors; $divisors { $_ }++ for map{ factors $_ } grep $_, $ +N, @data; for my $i ( keys %divisors ) { if( exists $dividends{ $i } ) { $divisors{ $i }, $dividends{ $i } while $divisors{ $i } and $dividends{ $i }; delete $divisors { $i } unless $divisors { $i }; delete $dividends{ $i } unless $dividends{ $i }; } } my $dividend = product( map{ ( $_ ) x $dividends{ $_ } } keys %div +idends ); my $divisor = product( map{ ( $_ ) x $divisors { $_ } } keys %div +isors ); return $dividend / $divisor; }
This works well for values smallish values, but cannot handle the example I gave above (NV overflow).
It was then I started thinking about how to eliminate more factors from the equation so as to reduce the size of the intermediate terms, and posted my SoPW. I think that hv's solution of expanding all terms to their prime factorizations before performing the cancelling out will be a winnerbut I haven't finished coding that yet.


Replies are listed 'Best First'.  

Re^7: Algorithm for cancelling common factors between two lists of multiplicands
by tmoertel (Chaplain) on Aug 09, 2005 at 00:40 UTC  
by BrowserUk (Patriarch) on Aug 09, 2005 at 01:18 UTC  
by tmoertel (Chaplain) on Aug 09, 2005 at 01:40 UTC  
by BrowserUk (Patriarch) on Aug 09, 2005 at 01:50 UTC  
by tmoertel (Chaplain) on Aug 09, 2005 at 02:02 UTC  
