I agree with the above, there are plenty of modules to help with parsing FASTA, the best documanted and supported are on BioPerl though:http://www.bioperl.org/wiki/FASTA_sequence_format
Or you can set the line delimiter to the FASTA record separator (>) :$/ = '>'; and slurp in the FASTA one record at a time.
As far as reverse complmenting goes, i am sure there is support out there, but it is very simple to do yourself:
my $seq = 'AGCTGATCGTAATAGAGCTA'; my $rev = rev_comp($seq,); print "The reverse complement of \'$seq\' is \'$rev\'\n"; sub rev_comp{ my $in = shift; ## reverse it my $rev_comp = reverse($in); ## complment it using tr/// my $count = $rev_comp =~ tr/AGCT/TCGA/i; ## or tr/AGCTN/TCGAN/i if y +ou have N's ## check we changed all the bases, or did something weird occur... if ($count = length($rev_comp){ return $rev_comp; } else { ## not all bases were changed, ## so something weird is happenning ## maybe you still have whitespace, or N's etc... return an error; } }
Obviously this is a lengthy way of doing it but i am trying to keep things clear! Hope it helps.
In reply to Re^2: Extracting DNA sequences from FASTA files
by BioLion
in thread Extracting DNA sequences from FASTA files
by statsman5
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |