I agree with the above, there are plenty of modules to help with parsing FASTA, the best documanted and supported are on BioPerl though:http://www.bioperl.org/wiki/FASTA_sequence_format Or you can set the line delimiter to the FASTA record separator (>) :$/ = '>'; and slurp in the FASTA one record at a time.
As far as reverse complmenting goes, i am sure there is support out there, but it is very simple to do yourself:
my $seq = 'AGCTGATCGTAATAGAGCTA';
my $rev = rev_comp($seq,);
print "The reverse complement of \'$seq\' is \'$rev\'\n";
sub rev_comp{
my $in = shift;
## reverse it
my $rev_comp = reverse($in);
## complment it using tr///
my $count = $rev_comp =~ tr/AGCT/TCGA/i; ## or tr/AGCTN/TCGAN/i if y
+ou have N's
## check we changed all the bases, or did something weird occur...
if ($count = length($rev_comp){
return $rev_comp;
} else {
## not all bases were changed,
## so something weird is happenning
## maybe you still have whitespace, or N's etc...
return an error;
}
}
Obviously this is a lengthy way of doing it but i am trying to keep things clear! Hope it helps.
Just a something something...
|