Re: Large data processing ...

Well, you use the function get_exon to make it more readable, but all you do in that function is manipulate the $begin and $end values. Furthermore, the slowdown comes from the copying of a potential large string: $chromosome. A solution is to make a function that takes just 2 arguments: $begin and $end, and manipulate those. The manipulation doesn't require $chromosome. So:

sub _ {$_ [0] - 1, $_ [1] - $_ [0] - 1}

while ((my $begin, my $end) =  each %exon_endpoints) {
    print substr ($chromosome => _ $begin, $end), "\n\n";
}
[download]

Of course, that still has the overhead of calling a user defined subroutine for each endpoint.

Abigail

Comment on Re: Large data processing ... Select or Download Code