It will help if you are looking to retrieve a subsequence from the human genome, the FASTA file of which is about 5 Gb;
I guess things have moved on. The version I have is just under 3GB and came in 25 files chr(1-22, M, X, Y).
That said, if his 3 posted sequences are representative of his 900,000; that means his file is a tad under 900MB.
Which if he can process that in "a few seconds"; means he could process your 5GB file in 5+bit * "a few seconds".
But, and here is the point. It will take Bio::DB::Fasta at least that same 5+bit*"a few seconds" to construct an index; before he can start processing anything. So for a one-off process, there is a net loss.
Now the real crux. Given all the additional layers and overheads; how many times does he have to redo the process in order to obtain a net gain? (If ever.)
Then add to that the (potential) problems with installation; and the learning curve of finding your way around the documentation for 897 modules to find the one that you want; and then learning how to use it to do what you want; and suddenly the reason why so many bioinformaticians are looking for Lite alternatives to the Bio::Behemoth and simple procedures in order to get their work done; rather than becoming technical debt slaves to the byzantine Bio::Empire.
In reply to Re^4: Searching array against hash
by BrowserUk
in thread Searching array against hash
by drhicks
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |