bioinformatics has asked for the wisdom of the Perl Monks concerning the following question:
This flat file is composed of consecutive fasta sequences (like the one above), and approaches 9 gig in size. What I am trying to do is slurp the entire file into an array (crazy, but my unix sys. can handle it:-) and then parse out the individual sequnces into sub arrays. Due to varying size, I can't use @subarray=splice(@list, 0, 11); to pull out each sequence. I have to separate the sequences based on the > symbol. What would be the simplist way to say "slurp in the multiple lines of data between > and > and place them into subarray X"? My main worry is trying to code this in a way so that perl can keep its place along the way, so I don't get the same sequence pulled out 20,000 times rather then 20,000 different sequence arrays. As well, if there are any suggestions on making this as painless a memory hog as possible, I would greatly appreciate it. I apologize if this seems a dumb question, but I'm a self-taught perl hacker, and still pretty new, though playing with some nice Unix toys....>gi|2695846|emb|Y13255.1|ABY13255 Acipenser baeri mRNA for immunoglobu +lin heavy chain, clone ScH 3.3 TGGTTACAACACTTTCTTCTTTCAATAACCACAATACTGCAGTACAATGGGGATTTTAACAGCTCTCTGT +ATAATAATGA CAGCTCTATCAAGTGTCCGGTCTGATGTAGTGTTGACTGAGTCCGGACCAGCAGTTATAAAGCCTGGAGA +GTCCCATAAA CTGTCCTGTAAAGCCTCTGGATTCACATTCAGCAGCGCCTACATGAGCTGGGTTCGACAAGCTCCTGGAA +AGGGTCTGGA ATGGGTGGCTTATATTTACTCAGGTGGTAGTAGTACATACTATGCCCAGTCTGTCCAGGGAAGATTCGCC +ATCTCCAGAG ACGATTCCAACAGCATGCTGTATTTACAAATGAACAGCCTGAAGACTGAAGACACTGCCGTGTATTACTG +TGCTCGGGGC GGGCTGGGGTGGTCCCTTGACTACTGGGGGAAAGGCACAATGATCACCGTAACTTCTGCTACGCCATCAC +CACCGACAGT GTTTCCGCTTATGGAGTCATGTTGTTTGAGCGATATCTCGGGTCCTGTTGCTACGGGCTGCTTAGCAACC +GGATTCTGCC TACCCCCGCGACCTTCTCGTGGACTGATCAATCTGGAAAAGCTTTT
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Refomating a large fasta file...
by Itatsumaki (Friar) on Nov 19, 2003 at 00:29 UTC | |
by bioinformatics (Friar) on Nov 19, 2003 at 18:06 UTC | |
by Itatsumaki (Friar) on Nov 19, 2003 at 18:19 UTC | |
|
Re: Refomating a large fasta file...
by Roger (Parson) on Nov 18, 2003 at 23:28 UTC | |
|
Re: Refomating a large fasta file...
by duff (Parson) on Nov 18, 2003 at 23:12 UTC | |
by Anonymous Monk on Nov 19, 2003 at 21:04 UTC | |
|
Re: Refomating a large fasta file...
by BrowserUk (Patriarch) on Nov 18, 2003 at 23:39 UTC | |
|
Re: Refomating a large fasta file...
by Hena (Friar) on Nov 19, 2003 at 10:21 UTC |