translating multiple DNA sequence to protein sequence

yuvraj_ghaly has asked for the wisdom of the Perl Monks concerning the following question:

I want to translate DNA sequences present in multi-fasta file. I have written a code but it will translate only one sequence in a file. So the question is: How would I translate DNA sequences present in multi-fasta file into their respective protein sequences????

print "ENTER THE FILENAME OF THE DNA SEQUENCE:= ";
$DNAfilename = <STDIN>;
chomp $DNAfilename;
unless ( open(DNAFILE, $DNAfilename) ) {
    print "Cannot open file \"$DNAfilename\"\n\n";
}
@DNA = <DNAFILE>;
close DNAFILE;
$DNA = join( '', @DNA);
print " \nThe original DNA file is:\n$DNA \n";
$DNA =~ s/\s//g;
my $protein='';
my $codon;
for(my $i=0;$i<(length($DNA)-2);$i+=3)
{
$codon=substr($DNA,$i,3);
$protein.=&codon2aa($codon);
}
print "The translated protein is :\n$protein\n";
<STDIN>;

sub codon2aa{
my($codon)=@_;
$codon=uc $codon;
my(%g)=('TCA'=>'S','TCC'=>'S','TCG'=>'S','TCT'=>'S','TTC'=>'F','TTT'=>
+'F','TTA'=>'L','TTG'=>'L','TAC'=>'Y','TAT'=>'Y','TAA'=>'_','TAG'=>'_'
+,'TGC'=>'C','TGT'=>'C','TGA'=>'_','TGG'=>'W','CTA'=>'L','CTC'=>'L','C
+TG'=>'L','CTT'=>'L','CCA'=>'P','CCC'=>'P','CCG'=>'P','CCT'=>'P','CAC'
+=>'H','CAT'=>'H','CAA'=>'Q','CAG'=>'Q','CGA'=>'R','CGC'=>'R','CGG'=>'
+R','CGT'=>'R','ATA'=>'I','ATC'=>'I','ATT'=>'I','ATG'=>'M','ACA'=>'T',
+'ACC'=>'T','ACG'=>'T','ACT'=>'T','AAC'=>'N','AAT'=>'N','AAA'=>'K','AA
+G'=>'K','AGC'=>'S','AGT'=>'S','AGA'=>'R','AGG'=>'R','GTA'=>'V','GTC'=
+>'V','GTG'=>'V','GTT'=>'V','GCA'=>'A','GCC'=>'A','GCG'=>'A','GCT'=>'A
+','GAC'=>'D','GAT'=>'D','GAA'=>'E','GAG'=>'E','GGA'=>'G','GGC'=>'G','
+GGG'=>'G','GGT'=>'G');
if(exists $g{$codon})
{
return $g{$codon};
}
else
{
print STDERR "Bad codon \"$codon\"!!\n";
exit;
}
}
[download]

Comment on translating multiple DNA sequence to protein sequence Download Code

Replies are listed 'Best First'.
Re: translating multiple DNA sequence to protein sequence by polypompholyx (Chaplain) on Aug 19, 2013 at 11:16 UTC
If you're planning on using Perl for bioinformatics, you might be better off installing BioPerl rather than hand-rolling FASTA parsers and translation codon tables. `use Bio::SeqIO; my $sequences = Bio::SeqIO->new( -file => "sequence.fasta", -format => "fasta", ); while ( my $dna = $sequences->next_seq ){ my $protein = $dna->translate( -codontable_id => 1, # standard genetic code -frame => 0, #reading-frame offset 0 ); print $dna->display_id, "\n"; print $protein->seq, "\n\n"; }` [download] Having said that, installing BioPerl (1.6.901) on Windows seems to be more difficult than I was expecting: I had to resort to `force` with Strawberry and CPAN, having simply given up trying to get it to install with ActivePerl and PPM.	[reply] [d/l] [select]
Re: translating multiple DNA sequence to protein sequence by jwkrahn (Abbot) on Aug 19, 2013 at 08:44 UTC
#!/usr/bin/perl use warnings; use strict; print 'ENTER THE FILENAME OF THE DNA SEQUENCE:= '; chomp( my $DNAfilename = <STDIN> ); open my $DNAFILE, $DNAfilename or die qq[Cannot open file "$DNAfilenam +e" because: $!]; local $/; ( my $DNA = uc <$DNAFILE> ) =~ tr/ACGT//cd; print "\nThe original DNA file is:\n$DNA\n"; my %codon2aa = qw( TCA S TCC S TCG S TCT S TTC F TTT F TTA L TTG L TAC Y TAT Y TAA _ TAG _ TGC C TGT C TGA _ TGG W CTA L CTC L CTG L CTT L CCA P CCC P CCG P CCT P CAC H CAT H CAA Q CAG Q CGA R CGC R CGG R CGT R ATA I ATC I ATT I ATG M ACA T ACC T ACG T ACT T AAC N AAT N AAA K AAG K AGC S AGT S AGA R AGG R GTA V GTC V GTG V GTT V GCA A GCC A GCG A GCT A GAC D GAT D GAA E GAG E GGA G GGC G GGG G GGT G ); my $protein = ''; while ( $DNA =~ /(...)/g ) { exists $codon2aa{ $1 } or die qq[Bad codon "$1"!!\n]; $protein .= $codon2aa{ $1 }; } print "The translated protein is :\n$protein\n"; <STDIN>; [download]	[reply] [d/l]
Re: translating multiple DNA sequence to protein sequence by marto (Cardinal) on Aug 19, 2013 at 07:51 UTC
What is this, a question or a code submission? If you're not asking a question then you've posted it in the wrong place. A link to Where should I post X? is displayed each time you post. Seekers of Perl Wisdom is for questions, Cool Uses for Perl is for code you want to share. You may want to read open, use the 3 argument open and actually die if you can't open your input file, printing `$!` to tell users why it fails.	[reply] [d/l]
Re: translating multiple DNA sequence to protein sequence by kcott (Archbishop) on Aug 19, 2013 at 07:53 UTC
G'day yuvraj_ghaly, "This code will help to translate a DNA sequence to protein sequence. The need of an our is to translate all the DNA sequences present in fasta file into protein sequences respectively. Here is the code: ..." There is no question here! You have been directed to "How do I post a question effectively?" on more than one occasion in the past. Please actually read it this time and follow its guidelines. I have downvoted your post. -- Ken	[reply]
Re^2: translating multiple DNA sequence to protein sequence by bioinformatics (Friar) on Aug 19, 2013 at 15:07 UTC
The OP is looking for help with a FASTA parser; the 'question' was just worded as a statement of what they need. Bioinformatics	[reply]
Re: translating multiple DNA sequence to protein sequence by Monk::Thomas (Friar) on Aug 19, 2013 at 07:46 UTC
"Here is the code:" What is the question?	[reply]
Re^2: translating multiple DNA sequence to protein sequence by yuvraj_ghaly (Sexton) on Aug 20, 2013 at 03:52 UTC
The question is I would like to extract sequences from multi-fasta file. How would I modify this code to do so	[reply]
Re: translating multiple DNA sequence to protein sequence by yuvraj_ghaly (Sexton) on Aug 22, 2013 at 05:40 UTC
Now I created a modified program for translating DNA sequences. It seems to give error output: "Bad Codon ATGATCTAGTCGATCGCTAGCTAGATCGCTAGCTGC......!!!!" I want to translate DNA sequences that are present in fasta file into their respective sequences. I need help from Perl monks. Here is the code which I used now use strict; #use warnings; use Encode; for my $file (@ARGV) { open my $fh, '<:encoding(UTF-8)', $file; my $input = join q{}, <$fh>; close $fh; while ( $input =~ /(^>.?\w?)$([^>])/smxg ) { my $name = $1; my $seq = $2; $seq =~ s/\n//smxg; my $trans = codon2aa($seq); print "$name\t$trans\n"; } } sub codon2aa { my($codon) = @_; $codon = uc $codon; my(%genetic_code) = ( 'TCA' => 'S', # Serine 'TCC' => 'S', # Serine 'TCG' => 'S', # Serine 'TCT' => 'S', # Serine 'TTC' => 'F', # Phenylalanine 'TTT' => 'F', # Phenylalanine 'TTA' => 'L', # Leucine 'TTG' => 'L', # Leucine 'TAC' => 'Y', # Tyrosine 'TAT' => 'Y', # Tyrosine 'TAA' => '_', # Stop 'TAG' => '_', # Stop 'TGC' => 'C', # Cysteine 'TGT' => 'C', # Cysteine 'TGA' => '_', # Stop 'TGG' => 'W', # Tryptophan 'CTA' => 'L', # Leucine 'CTC' => 'L', # Leucine 'CTG' => 'L', # Leucine 'CTT' => 'L', # Leucine 'CCA' => 'P', # Proline 'CCC' => 'P', # Proline 'CCG' => 'P', # Proline 'CCT' => 'P', # Proline 'CAC' => 'H', # Histidine 'CAT' => 'H', # Histidine 'CAA' => 'Q', # Glutamine 'CAG' => 'Q', # Glutamine 'CGA' => 'R', # Arginine 'CGC' => 'R', # Arginine 'CGG' => 'R', # Arginine 'CGT' => 'R', # Arginine 'ATA' => 'I', # Isoleucine 'ATC' => 'I', # Isoleucine 'ATT' => 'I', # Isoleucine 'ATG' => 'M', # Methionine 'ACA' => 'T', # Threonine 'ACC' => 'T', # Threonine 'ACG' => 'T', # Threonine 'ACT' => 'T', # Threonine 'AAC' => 'N', # Asparagine 'AAT' => 'N', # Asparagine 'AAA' => 'K', # Lysine 'AAG' => 'K', # Lysine 'AGC' => 'S', # Serine 'AGT' => 'S', # Serine 'AGA' => 'R', # Arginine 'AGG' => 'R', # Arginine 'GTA' => 'V', # Valine 'GTC' => 'V', # Valine 'GTG' => 'V', # Valine 'GTT' => 'V', # Valine 'GCA' => 'A', # Alanine 'GCC' => 'A', # Alanine 'GCG' => 'A', # Alanine 'GCT' => 'A', # Alanine 'GAC' => 'D', # Aspartic Acid 'GAT' => 'D', # Aspartic Acid 'GAA' => 'E', # Glutamic Acid 'GAG' => 'E', # Glutamic Acid 'GGA' => 'G', # Glycine 'GGC' => 'G', # Glycine 'GGG' => 'G', # Glycine 'GGT' => 'G', # Glycine ); if(exists $genetic_code{$codon}) { return $genetic_code{$codon}; }else{ print STDERR "Bad codon \"$codon\"!!\n"; exit; } } [download]	[reply] [d/l]
Re^2: translating multiple DNA sequence to protein sequence by choroba (Cardinal) on Aug 22, 2013 at 07:43 UTC
The subroutine is called "codon2aa". You supply the sequence as the parameter, but you should run it on individual codons: `for my $codon ($seq =~ /(...)/g) { my $trans = codon2aa($codon); print "$name\t$trans\n"; }` [download] لսႽ� ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l]
Re^3: translating multiple DNA sequence to protein sequence by yuvraj_ghaly (Sexton) on Aug 22, 2013 at 09:18 UTC
This won't work when numerous DNA sequences in fasta file	[reply]
Re^4: translating multiple DNA sequence to protein sequence by choroba (Cardinal) on Aug 22, 2013 at 09:25 UTC
Re^5: translating multiple DNA sequence to protein sequence by yuvraj_ghaly (Sexton) on Aug 22, 2013 at 10:22 UTC
Some notes below your chosen depth have not been shown here