Re: Reducing memory footprint when doing a lookup of millions of coordinates

One improvement that should be straight forward to implement is to group everything by chromosome first.

So far example read only those records from the file that are related to chr1. Build your %reps hash from them. Then search for all those coordinates on chr1 that you find interesting.

Then empty the hash with %reps = () to save memory, and read all chr2 coordinates.

Other possible improvements: instead of using a hash for the coordinates, use two arrays (for begin and end), and then do a binary search in them.

Of course a relational database like sqlite or postgres can do all these things for you if you build appropriate indexes over the coloumns.

Perl 6 - second systems done right

Comment on Re: Reducing memory footprint when doing a lookup of millions of coordinates Select or Download Code