Hi monks
I am writing a web based bioinformatic program that uses the hypergeometric distribution. I use the code from this perl node
#!/usr/bin/perl -w use strict; sub logfact { return gammln(shift(@_) + 1.0); } sub hypergeom { # There are m "bad" and n "good" balls in an urn. # Pick N of them. The probability of i or more successful selection +s: # (m!n!N!(m+n-N)!)/(i!(n-i)!(m+i-N)!(N-i)!(m+n)!) my ($n, $m, $N, $i) = @_; my $loghyp1 = logfact($m)+logfact($n)+logfact($N)+logfact($m+$n-$N) +; my $loghyp2 = logfact($i)+logfact($n-$i)+logfact($m+$i-$N)+logfact( +$N-$i)+logfact($m+$n); return exp($loghyp1 - $loghyp2); } sub gammln { my $xx = shift; my @cof = (76.18009172947146, -86.50532032941677, 24.01409824083091, -1.231739572450155, 0.12086509738661e-2, -0.5395239384953e-5); my $y = my $x = $xx; my $tmp = $x + 5.5; $tmp -= ($x + .5) * log($tmp); my $ser = 1.000000000190015; for my $j (0..5) { $ser += $cof[$j]/++$y; } -$tmp + log(2.5066282746310005*$ser/$x); } print hypergeom(300,700,100,40),"\n";
For small numbers it works fine. However, for large input it is too slow. For example, I have an input set that needs to call the hypergeom function 137544 times, and it takes about 70 seconds. As this is a web based tool, it is of course too slow.
My question is what other options do I have? I assumed the gammln is very fast but for large numbers it chokes.

Another option is Inline::C which is supposed to be fast but I don't know if calling a C function so often is the best option.
Math::GSL::CDF is also an option, but I'm not sure it is fast enough. Both modules need to be installed, and since I don't have permissions on the University server I would like some advice before asking for installation.

Any other ideas and suggestions would be welcome. This is a part of Perl I'm not very familiar with and would be happy to get input.
Also, I know the thread I mentioned earlier was pretty thorough, but it was also 5 years ago and I think it is worth asking this question again.
Thanks in advance
Guy Naamati

Q:What do Alexander the Great and Kermit the Frog have in common?

A: Their middle name
Update: Solution by BrowserUk brought the time down to 5 seconds. However, the output is different, and I am now trying to figure out why.
Thanks for your help,
Guy
Problem solved. Thanks again!

In reply to Fast hypergeometric calculation in Perl by mrguy123

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.