Re: Algorithms

Although I am replying to this post eons later, I would like to share my experience with using an algorithm to speed up a relatively straight forward problem. Often Bioinformaticians face the task of checking if a given position lies within a target region.
For e.g.

position target_region is_member
100       95_105          yes
110       120_130         no
[download]

and so on...
The query file contains unique records and the target file is sorted numerically.
This seemingly simple task can be a big drag when the query file is large (1 million records) against a target file say (200K). I sought tips from Mastering Algorithm in Perl.
I wanted to try the quicksort algorithm explained in this book.
First I divided the target region into two regions and searched if the query position fell in the two halves including the pivot region. Once I found the half in which the query lies, I pass it to a recursion function and each time the target region is split into two-half arrays and searched. This works and I get the expected output. But, the recursion was a big drag.
I then sub divided the target region into 16 chunks and checked in which of this smaller chunks the query region lies. Then I passed that chunk into the recursion function and this significantly increased the speed. I still wanted to tweak the code further. Originally I has stored the query file into a hash. I changed this to an array and gained more speed. I am still refining the code and hope to tweak it further.
So, here is my experience with Algorithms and speed. The problem may be trivial. But, I had to process close to 1000 such files and this approach helped me a lot! You can follow the evolution of my code at What is the best approach to check if a position falls within a target range?.

Comment on Re: Algorithms Download Code