| [reply] |
query
=====
100
200
300
target_region
=============
10_50
60_150
180_250
expected output
===============
100 60_150
200 180_250
Thanks Much,
Uma | [reply] [d/l] |
| [reply] |
FWIW. I've a solution that matches 2e6 random integers (0 .. 1000) against 200,000 randomly generated ranges (0..700, 1..300) in 45 minutes using 3GB of ram.
Of course, of those 2e6 queries only 1000 are unique so it's doing 2000 more work than it needs to.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
Hi BrowserUk,
Thanks much for all your replies. I did some major refactoring of the code these past few days and was able to achieve significant improvement in speed. Major changes are:
1. split the target region into 24 chunks for 24 chromosomes (chr) and only loaded the chr of interest in memory.
2. Converted the target hash to a target array. This caused a big gain in speed.
UPDATE
3.Divided target region into 8 chunks 10-12.5-25-50 and so on. Checked the if the query snp is in which chunk before assigning target status.
Uma
| [reply] |
| [reply] |