Re: Extracting DNA sequences from FASTA files

Replies are listed 'Best First'.
Re^2: Extracting DNA sequences from FASTA files by BioLion (Curate) on Jun 29, 2009 at 08:58 UTC
I agree with the above, there are plenty of modules to help with parsing FASTA, the best documanted and supported are on BioPerl though: http://www.bioperl.org/wiki/FASTA_sequence_format Or you can set the line delimiter to the FASTA record separator (>) : `$/ = '>';` and slurp in the FASTA one record at a time. As far as reverse complmenting goes, i am sure there is support out there, but it is very simple to do yourself: my $seq = 'AGCTGATCGTAATAGAGCTA'; my $rev = rev_comp($seq,); print "The reverse complement of \'$seq\' is \'$rev\'\n"; sub rev_comp{ my $in = shift; ## reverse it my $rev_comp = reverse($in); ## complment it using tr/// my $count = $rev_comp =~ tr/AGCT/TCGA/i; ## or tr/AGCTN/TCGAN/i if y +ou have N's ## check we changed all the bases, or did something weird occur... if ($count = length($rev_comp){ return $rev_comp; } else { ## not all bases were changed, ## so something weird is happenning ## maybe you still have whitespace, or N's etc... return an error; } } [download] Obviously this is a lengthy way of doing it but i am trying to keep things clear! Hope it helps. Just a something something...	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^2: Extracting DNA sequences from FASTA files
by BioLion (Curate) on Jun 29, 2009 at 08:58 UTC

I agree with the above, there are plenty of modules to help with parsing FASTA, the best documanted and supported are on BioPerl though:
http://www.bioperl.org/wiki/FASTA_sequence_format

Or you can set the line delimiter to the FASTA record separator (>) :
$/ = '>';
and slurp in the FASTA one record at a time.

As far as reverse complmenting goes, i am sure there is support out there, but it is very simple to do yourself:

my $seq = 'AGCTGATCGTAATAGAGCTA';
my $rev = rev_comp($seq,);
print "The reverse complement of \'$seq\' is \'$rev\'\n";
sub rev_comp{
  my $in = shift;
  ## reverse it
  my $rev_comp = reverse($in);
  ## complment it using tr///
  my $count = $rev_comp =~ tr/AGCT/TCGA/i; ## or tr/AGCTN/TCGAN/i if y
+ou have N's
  ## check we changed all the bases, or did something weird occur...
  if ($count = length($rev_comp){

    return $rev_comp;
  } else {

    ## not all bases were changed, 
    ## so something weird is happenning
    ## maybe you still have whitespace, or N's etc...

    return an error;
  }

}
[download]

Obviously this is a lengthy way of doing it but i am trying to keep things clear! Hope it helps.

Just a something something...

[reply]
[d/l]
[select]