in reply to Extracting DNA sequences from FASTA files

See Tie::File::AnyData::Bio::Fasta, FASTA Splitter
  • Comment on Re: Extracting DNA sequences from FASTA files

Replies are listed 'Best First'.
Re^2: Extracting DNA sequences from FASTA files
by BioLion (Curate) on Jun 29, 2009 at 08:58 UTC

    I agree with the above, there are plenty of modules to help with parsing FASTA, the best documanted and supported are on BioPerl though:
    http://www.bioperl.org/wiki/FASTA_sequence_format

    Or you can set the line delimiter to the FASTA record separator (>) :
    $/ = '>';
    and slurp in the FASTA one record at a time.

    As far as reverse complmenting goes, i am sure there is support out there, but it is very simple to do yourself:

    my $seq = 'AGCTGATCGTAATAGAGCTA'; my $rev = rev_comp($seq,); print "The reverse complement of \'$seq\' is \'$rev\'\n"; sub rev_comp{ my $in = shift; ## reverse it my $rev_comp = reverse($in); ## complment it using tr/// my $count = $rev_comp =~ tr/AGCT/TCGA/i; ## or tr/AGCTN/TCGAN/i if y +ou have N's ## check we changed all the bases, or did something weird occur... if ($count = length($rev_comp){ return $rev_comp; } else { ## not all bases were changed, ## so something weird is happenning ## maybe you still have whitespace, or N's etc... return an error; } }

    Obviously this is a lengthy way of doing it but i am trying to keep things clear! Hope it helps.

    Just a something something...