RobertCraven has asked for the wisdom of the Perl Monks concerning the following question:

Almighty Monks,

I am trying to translate a DNA sequence to a protein sequence using all 3 ORFs (same strand). However, when trying to pass options to the sequence object (like -frame => 1 or -complete => 1), the option is introduced into the object:

Source:
use strict; use warnings; use Data::Dumper; use Bio::Tools::CodonTable; use Bio::Seq; my $dna = 'ATGAAAGGAACATCCATTTTATTCAAAGCACCTCCAAACCTGCAATCCTAAGTTCCAGG +CAACTCAATCCCAAAAATCCACTGTAGATGCCCAAAGGCTGGGGTGTTCGGTCTTCAACATTTTTGCCT +TTGTGGCTCCCAGTCAAGATAGAGCTGCACCAAGTCCAATTCCATTCCTCATCACAGATGATTTTTTCT +ACTTTAAGATCAGAACTATACAAGCTTCTTGCTTTGTGTCAGCATGCTGTTGTACCCATGGGCAAATTC +TTAGGTAAGACAAAAACACAGTCCCAAGGGCAGGTAGTAATTTTTTCAGAAAAAGGTAAGGCAATCATT +TATCTCAGTCTGCCCAGGACAGTCCCAATTTACACATGTATATTCTCCCAATCTGTAGGCTGTCTTTTC +ATTTTGTTGATTATTTCACTTAATTTTTTATTATTTATTTATTTTATAGAGACAGATCTCATTATGTTG +CCCAGGGTGATCCTTGATCTCCTGGCCTCAAGTGATCCTCCAACCTTGGTCTCCCAAAGTGCTGGGATT +ACAGATGTGAACTACCACACCCAGTCAACGTGCAGAAGGTTTTCAGTTTGATGTAGTCTGATGTAGTCT +CATGTATTTATCCTTCTTGTTGTTGCCTGAGCTTTTGGTGTGATATCCAAAAATATCATTGCCAAGATC +AATATCAAGAAACTTTCCCCCTATGTTTCTTACAGAAATTTTATGGTTTCAGATTTTTCATCCATTTTG +AGTATATTTGTGTGTATGATGTAAGATAAGGGTCCAGTCTCCCCAGTGTTGGATATCCAATTTTCATAA +CACCATTTATTGAAGAGATTATTCTTTCTCCACTGTGTTTTCTTGATGTCCTTGTCAAAAATTAGTTGA +CTTTTATATGCTTGGGTTTATTTCTGGGCTCTATTCTGTTTCATTGCTTTACATCTCTGTTTTCATGCC +AGTGCCACAGTGTTTTGATTACTATAGCTTTGTAATATAATTTGAAATCAGAATGTGTAATACCTATAA +CTTTGTTTTTTGCTCTAAAGATTTATTTATTTATTTATTTTTGCCATTTCAGGTCTTTTGTGGTTTCAT +ATGAATTTCAGAATTGTTTTTCCTATTTCTGTGAAAAATGCCATTGACATTTTGATAGGGATTGTGTTG +AATCTATATATTGCTTTGGATAGTATGGATG'; my $seq_obj = Bio::Seq->new( -seq => $dna, -alphabet => 'dna' ); my $prot_obj = $seq_obj->translate(-complete => 1); print $prot_obj->seq, "\n";
Output:
MKGTSILFKAPPNLQS-completeVPGNSIPKIHCRCPKAGVFGLQHFCLCGSQSR-completeSCTK +SNSIPHHR-completeFFLL-completeDQNYTSFLLCVSMLLYPWANS-completeVRQKHSPKG +R-complete-completeFFQKKVRQSFISVCPGQSQFTHVYSPNL-completeAVFSFC-comple +teLFHLIFYYLFIL-completeRQISLCCPG-completeSLISWPQVILQPWSPKVLGLQM-compl +eteTTTPSQRAEGFQFDVV-completeCSLMYLSFLLLPELLV-completeYPKISLPRSISRNFPP +MFLTEILWFQIFHPF-completeVYLCV-completeCKIRVQSPQCWISNFHNTIY-completeRD +YSFSTVFS-completeCPCQKLVDFYMLGFISGLYSVSLLYISVFMPVPQCFDYYSFVI-complete +FEIRMCNTYNFVFCSKDLFIYLFLPFQVFCGFI-completeISELFFLFL-completeKMPLTF-co +mplete-completeGLC-completeIYILLWIVWM
Versions:
Perl version v5.10.0 BioPerl 1.2.3

How do I correctly pass an option?
Any help is greatly appreciated!

Replies are listed 'Best First'.
Re: BioPerl translate sequence
by educated_foo (Vicar) on May 13, 2011 at 13:51 UTC
    It looks like it's using "-complete" to represent the stop codon. As is usually the case, for simple string manipulation it's easier and faster to take BioPerl out of the equation. All you need is a substitution and a lookup table:
    my %prot = ( 'TCA'=>'S','TCC'=>'S','TCG'=>'S','TCT'=>'S', 'TTC'=>'F','TTT'=>'F', 'TTA'=>'L','TTG'=>'L', 'TAC'=>'Y','TAT'=>'Y', 'TAA'=>'_','TAG'=>'_','TGA'=>'_', 'TGC'=>'C','TGT'=>'C', 'TGG'=>'W', 'CTA'=>'L','CTC'=>'L','CTG'=>'L','CTT'=>'L', 'CCA'=>'P','CCC'=>'P','CCG'=>'P','CCT'=>'P', 'CAC'=>'H','CAT'=>'H', 'CAA'=>'Q','CAG'=>'Q', 'CGA'=>'R','CGC'=>'R','CGG'=>'R','CGT'=>'R', 'ATA'=>'I','ATC'=>'I','ATT'=>'I', 'ATG'=>'M', 'ACA'=>'T','ACC'=>'T','ACG'=>'T','ACT'=>'T', 'AAC'=>'N','AAT'=>'N', 'AAA'=>'K','AAG'=>'K', 'AGC'=>'S','AGT'=>'S', 'AGA'=>'R','AGG'=>'R', 'GTA'=>'V','GTC'=>'V','GTG'=>'V','GTT'=>'V', 'GCA'=>'A','GCC'=>'A','GCG'=>'A','GCT'=>'A', 'GAC'=>'D','GAT'=>'D', 'GAA'=>'E','GAG'=>'E', 'GGA'=>'G','GGC'=>'G','GGG'=>'G','GGT'=>'G'); sub dna2prot { my $dna = uc shift; $dna =~ y/ACGT//cd; map { s/(...)/$prot{$1}||'?'/eg; $_ } $dna, ''.substr($dna,1), ''.substr($dna,2); } print "$_\n" for dna2prot($sequence)
      Thank you for the work of writing the hash, but unfortunately this code snippet will be only the beginning. There will be different codons involved, BLAST etc. . Instead of again reinventing the wheel like so many times before I'd like to use existing methods this time.
      Thanks again!
Re: BioPerl translate sequence
by Anonymous Monk on May 13, 2011 at 13:56 UTC
    However, when trying to pass options to the sequence object (like -frame => 1 or -complete => 1), the option is introduced into the object: How do I correctly pass an option? Any help is greatly appreciated!

    What documentation are you referencing? See http://doc.bioperl.org/releases/bioperl-1.4/Bio/Perl.html#POD8
     Title   : translate
     Usage   : $seqobj = translate($seq_or_string_scalar)
    
     Function: translates a DNA sequence object OR just a plain
               string of DNA to amino acids
     Returns : A Bio::Seq object
    
     Args    : Either a sequence object or a string of
               just DNA sequence characters
    So Bio::Seq->translate doesn't expect any kind of complete option


    I am trying to translate a DNA sequence to a protein sequence using all 3 ORFs (same strand).

    What? ORF sounds like something Pinky would say :D

      Thank you for your answer. The idea is coming from here:

      BioPerl Wiki

      Gosh, have not seen Pinky in ages... :-D
      Not sure if he would have ever talked about ORFs ;)