Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
Hello, I have a fasta file with about 400 DNA sequences. I'm trying to come up with a perl script to remove all sequences that have less than 500 nucleotides and write the >500 nucleotides sequences to a new file. Here is an example of what the fasta file looks like:
>C1_A01_R.trimmed.seq (Quality-trimmed) Agencourt Bioscience Corporation ABI
GGCCGCCAGTGTGCTGGAATCCGCCCTTAACCTGGTTGATCCCGCCAGTAGTCATACGCT CGTCTCAAAGATTAAGCCATGCATGTCTAAGTATAACTCTTTTACTTTGAAAACTGCGAA CGGCTCATTATATCAGTTATAGTTTATTTGATAGTCCCTTACTACTTGGATACCCGTAGT AATTCTAGAGCTAATACATGCATCAATACCCAACTGTTCGCGGAAGGGTAGTATTTATTA GGTATAGACCAACCGTCTTCGGACGTGCTTTGGTGATTCATAATAACTTTTCGAATCGCA TGGCTCCATGCCGGCGATGGATCATTCAAGTTTCTGCCCTATCAGCTTTGG>C1_A03_R.trimmed.seq (Quality-trimmed) Agencourt Bioscience Corporation ABI
CCGAAGTAATTCTAGAGCTAATACATGCA>C1_A04_R.trimmed.seq (Quality-trimmed) Agencourt Bioscience Corporation ABI
TAGTAACGGCCGCCAGTGTGCTGGAATTCGCCCTTAACCTGGTTGATCCTGCCAGTAGTC ATACGCTCGTCTCAAAGATTAGGCCATGCATGTCTAAGTATAACTCTTTTACTTTGAAAA CTGCGAACGGCTCATTATATCAGTTATAGTTTATTTGATAGTCCCTTACTACTTGGATAC CCGTAGTAATTCTAGAGCTAATACATGCATCAATACCCGACTGTTCGCGGAAGGGTAGTA TTTATTAGGTATAGACCAACCGTCTTCGGACGTGCTTTGGTGATTCATAATAACTTTTCG AATCGCATGGCTCCATGCCGGCGATGGATCATTCAAGTTTCTGCCCTATCAGCTTTGGAT GGTAGTGTATTGGACTACCATGGCTTTAACGGGTAACGAATTGTTAGGGCAAGATTTCGG AGAGGGAGCCTGAGAGACGGCTACCACATCCAAGGAAGGCAGCGGGCGCGTAAATTACCCDoes anyone out there have any scripting suggestions????
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Parse DNA fasta file
by Anonymous Monk on Oct 29, 2009 at 21:05 UTC | |
|
Re: Parse DNA fasta file
by ack (Deacon) on Oct 30, 2009 at 02:46 UTC | |
|
Re: Parse DNA fasta file
by arun_kom (Monk) on Oct 30, 2009 at 17:56 UTC | |
by william.orsi (Initiate) on Nov 11, 2009 at 17:03 UTC | |
by BrowserUk (Patriarch) on Nov 11, 2009 at 17:21 UTC | |
by arun_kom (Monk) on Nov 12, 2009 at 09:07 UTC | |
|
Re: Parse DNA fasta file
by BioLion (Curate) on Oct 30, 2009 at 17:27 UTC |