in reply to Re^3: Searching array against hash
in thread Searching array against hash
It will help if you are looking to retrieve a subsequence from the human genome, the FASTA file of which is about 5 Gb;
I guess things have moved on. The version I have is just under 3GB and came in 25 files chr(1-22, M, X, Y).
That said, if his 3 posted sequences are representative of his 900,000; that means his file is a tad under 900MB.
Which if he can process that in "a few seconds"; means he could process your 5GB file in 5+bit * "a few seconds".
But, and here is the point. It will take Bio::DB::Fasta at least that same 5+bit*"a few seconds" to construct an index; before he can start processing anything. So for a one-off process, there is a net loss.
Now the real crux. Given all the additional layers and overheads; how many times does he have to redo the process in order to obtain a net gain? (If ever.)
Then add to that the (potential) problems with installation; and the learning curve of finding your way around the documentation for 897 modules to find the one that you want; and then learning how to use it to do what you want; and suddenly the reason why so many bioinformaticians are looking for Lite alternatives to the Bio::Behemoth and simple procedures in order to get their work done; rather than becoming technical debt slaves to the byzantine Bio::Empire.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^5: Searching array against hash
by bioinformatics (Friar) on Aug 22, 2013 at 03:31 UTC | |
by BrowserUk (Patriarch) on Aug 22, 2013 at 04:22 UTC |