Re^6: finding open reading frames

The O(n**2) nested-loop performance is going to kill you on some datasets. For a really pathological one, try:

$sequence = 'ATG' x 1e6;
[download]

I estimate that your code would take about a day and a half to process that. My code handles it in just over a second.

The human genome is around 3e9 base-pairs long. That's small enough to fit it all in memory, but large enough that you need to use efficient algorithms on it.

Comment on Re^6: finding open reading frames Download Code