Hi monks
I am writing a web based
bioinformatic program that uses the hypergeometric distribution.
I use the code from this
perl node
#!/usr/bin/perl -w
use strict;
sub logfact {
return gammln(shift(@_) + 1.0);
}
sub hypergeom {
# There are m "bad" and n "good" balls in an urn.
# Pick N of them. The probability of i or more successful selection
+s:
# (m!n!N!(m+n-N)!)/(i!(n-i)!(m+i-N)!(N-i)!(m+n)!)
my ($n, $m, $N, $i) = @_;
my $loghyp1 = logfact($m)+logfact($n)+logfact($N)+logfact($m+$n-$N)
+;
my $loghyp2 = logfact($i)+logfact($n-$i)+logfact($m+$i-$N)+logfact(
+$N-$i)+logfact($m+$n);
return exp($loghyp1 - $loghyp2);
}
sub gammln {
my $xx = shift;
my @cof = (76.18009172947146, -86.50532032941677,
24.01409824083091, -1.231739572450155,
0.12086509738661e-2, -0.5395239384953e-5);
my $y = my $x = $xx;
my $tmp = $x + 5.5;
$tmp -= ($x + .5) * log($tmp);
my $ser = 1.000000000190015;
for my $j (0..5) {
$ser += $cof[$j]/++$y;
}
-$tmp + log(2.5066282746310005*$ser/$x);
}
print hypergeom(300,700,100,40),"\n";
For small numbers it works fine. However, for large input it is too slow. For example, I have an input set that needs to call the hypergeom function 137544 times, and it takes about 70 seconds. As this is a web based tool, it is of course too slow.
My question is what other options do I have? I assumed the gammln is very fast but for large numbers it chokes.
Another option is
Inline::C which is supposed to be fast but I don't know if calling a C function so often is the best option.
Math::GSL::CDF is also an option, but I'm not sure it is fast enough. Both modules need to be installed, and since I don't have permissions on the University server I would like some advice before asking for installation.
Any other ideas and suggestions would be welcome. This is a part of Perl I'm not very familiar with and would be happy to get input.
Also, I know the thread I mentioned earlier was pretty thorough, but it was also 5 years ago and I think it is worth asking this question again.
Thanks in advance
Guy Naamati
Q:What do Alexander the Great and Kermit the Frog have in common?
A: Their middle name
Update: Solution by
BrowserUk brought the time down to 5 seconds. However, the output is different, and I am now trying to figure out why.
Thanks for your help,
Guy
Problem solved. Thanks again!
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.