Hello all;

I have a DNA sequence and I want to find all start codons(ATG,GTG,) and stop codons. I want to translate all possible subsequences betwen start and stop codons to their corresponding protein sequences.This should be on the 1st frame only.

For instance; $Dna = "AAAATGGGGTAAGTGAACGGGTAA" should return the corresponding proteins of "ATGGGGTAA" and "GTGAACGGGTAA" but should work also for very long sequences.

I have tried to write something like this but it doesnt seem to work:

#!/usr/bin/perl use strict; use strict; use warnings; use Bio::SearchIO; use Bio::Seq; use Bio::SeqIO; my($seqio); my($genio); if(-e $ARGV[0]) { $seqio = Bio::SeqIO->new( '-format' => 'fasta' , -file => $ARG +V[0]); } else { die "$ARGV[0] not found\n"; } my($seq) = ""; my($len) = ""; my($head) = ""; while ( my $seqobj = $seqio->next_seq ) { $len = $seqobj->length(); $head = $seqobj->id(); $seq = $seqobj->seq(); chomp($seq); } my(%genetic_code) = ( 'TCA' => 'S', # Serine 'TCC' => 'S', # Serine 'TCG' => 'S', # Serine 'TCT' => 'S', # Serine 'TTC' => 'F', # Phenylalanine 'TTT' => 'F', # Phenylalanine 'TTA' => 'L', # Leucine 'TTG' => 'L', # Leucine 'TAC' => 'Y', # Tyrosine 'TAT' => 'Y', # Tyrosine 'TAA' => '_', # Stop 'TAG' => '_', # Stop 'TGC' => 'C', # Cysteine 'TGT' => 'C', # Cysteine 'TGA' => '_', # Stop 'TGG' => 'W', # Tryptophan 'CTA' => 'L', # Leucine 'CTC' => 'L', # Leucine 'CTG' => 'L', # Leucine 'CTT' => 'L', # Leucine 'CCA' => 'P', # Proline 'CCC' => 'P', # Proline 'CCG' => 'P', # Proline 'CCT' => 'P', # Proline 'CAC' => 'H', # Histidine 'CAT' => 'H', # Histidine 'CAA' => 'Q', # Glutamine 'CAG' => 'Q', # Glutamine 'CGA' => 'R', # Arginine 'CGC' => 'R', # Arginine 'CGG' => 'R', # Arginine 'CGT' => 'R', # Arginine 'ATA' => 'I', # Isoleucine 'ATC' => 'I', # Isoleucine 'ATT' => 'I', # Isoleucine 'ATG' => 'M', # Methionine 'ACA' => 'T', # Threonine 'ACC' => 'T', # Threonine 'ACG' => 'T', # Threonine 'ACT' => 'T', # Threonine 'AAC' => 'N', # Asparagine 'AAT' => 'N', # Asparagine 'AAA' => 'K', # Lysine 'AAG' => 'K', # Lysine 'AGC' => 'S', # Serine 'AGT' => 'S', # Serine 'AGA' => 'R', # Arginine 'AGG' => 'R', # Arginine 'GTA' => 'V', # Valine 'GTC' => 'V', # Valine 'GTG' => 'V', # Valine 'GTT' => 'V', # Valine 'GCA' => 'A', # Alanine 'GCC' => 'A', # Alanine 'GCG' => 'A', # Alanine 'GCT' => 'A', # Alanine 'GAC' => 'D', # Aspartic Acid 'GAT' => 'D', # Aspartic Acid 'GAA' => 'E', # Glutamic Acid 'GAG' => 'E', # Glutamic Acid 'GGA' => 'G', # Glycine 'GGC' => 'G', # Glycine 'GGG' => 'G', # Glycine 'GGT' => 'G', # Glycine ); my @startsRF1 =(); my @startsRF2 =(); my @startsRF3 =(); my @stopsRF1 = (); my @stopsRF2 = (); my @stopsRF3 = (); my @arrayOfORFs = (); my @arrayOfTranslations = (); my $joinedAminoAcids = (); while ($seq =~ m/ATG|TTG|CTG|ATT|CTA|GTG|ATT/gi){ my $matchPosition = pos($seq) - 3; if (($matchPosition % 3) == 0) { push (@startsRF1, $matchPosition); } while ($seq =~ m/TAG|TAA|TGA/gi){ my $matchPosition = pos($seq); if (($matchPosition % 3) == 0) { push (@stopsRF1, $matchPosition); } my $codonRange = ""; my $startPosition = 0; my $stopPosition = 0; @startsRF1 = reverse(@startsRF1); @stopsRF1 = reverse(@stopsRF1); while (scalar(@startsRF1) > 0) { $codonRange = ""; $startPosition = pop(@startsRF1); if ($startPosition < $stopPosition) { next; } my $ORFseq = ""; while (scalar(@stopsRF1) > 0) { $stopPosition = pop(@stopsRF1); if ($stopPosition > $startPosition) { my $difF = $stopPosition - $startPosition; $ORFseq = substr($seq, $startPosition,(length($seq)-$difF)); push (@arrayOfORFs, $ORFseq); } foreach $ORFseq (@arrayOfORFs){ my @growingProtein = (); for (my $i = 0; $i <= (length($ORFseq) - 3); $i = $i + 3) { my $codon = substr($ORFseq, $i, 3); if (exists( $genetic_code{$codon} )){ push (@growingProtein, $genetic_code{$codon}); } else { push (@growingProtein, "X"); } } my $joinedAminoAcids = join("",@growingProtein); push (@arrayOfTranslations, $joinedAminoAcids); } foreach(@arrayOfTranslations) { print $_, "\n"; } } } } }

I NEED HELP URGENTLY PLEASE.

Regards,
Emman


In reply to orf subsequences by odegbon

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.