Re: translating multiple DNA sequence to protein sequence

Now I created a modified program for translating DNA sequences.

It seems to give error output:

"Bad Codon ATGATCTAGTCGATCGCTAGCTAGATCGCTAGCTGC......!!!!"

I want to translate DNA sequences that are present in fasta file into their respective sequences. I need help from Perl monks.

Here is the code which I used now

use strict;
#use warnings;
use Encode; 

for my $file (@ARGV) {
    open my $fh, '<:encoding(UTF-8)', $file;
    my $input = join q{}, <$fh>; 
    close $fh;
   
   while ( $input =~ /(^>.*?\w?)$([^>]*)/smxg ) {
   
        my $name = $1;
        my $seq = $2;
        $seq =~ s/\n//smxg;
        my $trans = codon2aa($seq);
        print "$name\t$trans\n";
    }
}

sub codon2aa {
    my($codon) = @_;

    $codon = uc $codon;
 
    my(%genetic_code) = (
    
    'TCA' => 'S',    # Serine
    'TCC' => 'S',    # Serine
    'TCG' => 'S',    # Serine
    'TCT' => 'S',    # Serine
    'TTC' => 'F',    # Phenylalanine
    'TTT' => 'F',    # Phenylalanine
    'TTA' => 'L',    # Leucine
    'TTG' => 'L',    # Leucine
    'TAC' => 'Y',    # Tyrosine
    'TAT' => 'Y',    # Tyrosine
    'TAA' => '_',    # Stop
    'TAG' => '_',    # Stop
    'TGC' => 'C',    # Cysteine
    'TGT' => 'C',    # Cysteine
    'TGA' => '_',    # Stop
    'TGG' => 'W',    # Tryptophan
    'CTA' => 'L',    # Leucine
    'CTC' => 'L',    # Leucine
    'CTG' => 'L',    # Leucine
    'CTT' => 'L',    # Leucine
    'CCA' => 'P',    # Proline
    'CCC' => 'P',    # Proline
    'CCG' => 'P',    # Proline
    'CCT' => 'P',    # Proline
    'CAC' => 'H',    # Histidine
    'CAT' => 'H',    # Histidine
    'CAA' => 'Q',    # Glutamine
    'CAG' => 'Q',    # Glutamine
    'CGA' => 'R',    # Arginine
    'CGC' => 'R',    # Arginine
    'CGG' => 'R',    # Arginine
    'CGT' => 'R',    # Arginine
    'ATA' => 'I',    # Isoleucine
    'ATC' => 'I',    # Isoleucine
    'ATT' => 'I',    # Isoleucine
    'ATG' => 'M',    # Methionine
    'ACA' => 'T',    # Threonine
    'ACC' => 'T',    # Threonine
    'ACG' => 'T',    # Threonine
    'ACT' => 'T',    # Threonine
    'AAC' => 'N',    # Asparagine
    'AAT' => 'N',    # Asparagine
    'AAA' => 'K',    # Lysine
    'AAG' => 'K',    # Lysine
    'AGC' => 'S',    # Serine
    'AGT' => 'S',    # Serine
    'AGA' => 'R',    # Arginine
    'AGG' => 'R',    # Arginine
    'GTA' => 'V',    # Valine
    'GTC' => 'V',    # Valine
    'GTG' => 'V',    # Valine
    'GTT' => 'V',    # Valine
    'GCA' => 'A',    # Alanine
    'GCC' => 'A',    # Alanine
    'GCG' => 'A',    # Alanine
    'GCT' => 'A',    # Alanine
    'GAC' => 'D',    # Aspartic Acid
    'GAT' => 'D',    # Aspartic Acid
    'GAA' => 'E',    # Glutamic Acid
    'GAG' => 'E',    # Glutamic Acid
    'GGA' => 'G',    # Glycine
    'GGC' => 'G',    # Glycine
    'GGG' => 'G',    # Glycine
    'GGT' => 'G',    # Glycine
    );

    if(exists $genetic_code{$codon}) {
        return $genetic_code{$codon};
    }else{

            print STDERR "Bad codon \"$codon\"!!\n";
            exit;
    }
}
[download]

Comment on Re: translating multiple DNA sequence to protein sequence Download Code

Replies are listed 'Best First'.
Re^2: translating multiple DNA sequence to protein sequence by choroba (Cardinal) on Aug 22, 2013 at 07:43 UTC
The subroutine is called "codon2aa". You supply the sequence as the parameter, but you should run it on individual codons: `for my $codon ($seq =~ /(...)/g) { my $trans = codon2aa($codon); print "$name\t$trans\n"; }` [download] لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l]
Re^3: translating multiple DNA sequence to protein sequence by yuvraj_ghaly (Sexton) on Aug 22, 2013 at 09:18 UTC
This won't work when numerous DNA sequences in fasta file	[reply]
Re^4: translating multiple DNA sequence to protein sequence by choroba (Cardinal) on Aug 22, 2013 at 09:25 UTC
I tested the code with the following data: `>header1 ATGATCTAGTCGATCGCTAGCTAGATCGCTAGCTG CATGATCTAGTCGATCGCTAGCTAGATCGCTAGCTGC >header2 ATGATCTAGTCGATCGCTAGCTAGATCGCTAGCTGC ATGATCTAGTCGATCGCTAGCTAGATCGCTAGCTGC` [download] It gives the follwoing output: Read more... (1211 Bytes) Please, be more specific and explain what you mean by "won't work". لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l] [select]
Re^5: translating multiple DNA sequence to protein sequence by yuvraj_ghaly (Sexton) on Aug 22, 2013 at 10:22 UTC
Re^6: translating multiple DNA sequence to protein sequence by choroba (Cardinal) on Aug 22, 2013 at 10:28 UTC
Some notes below your chosen depth have not been shown here