Perlmonks, This is the question I am having trouble answering
Use random DNA sequence generator as many times as you need to get the protein-coding region(s) (nucleotide triplets). Minimum sequence length is 500bp.
• Find all possible protein-coding regions in microbes (between start and stop codons, ATG – TAG|TGA|TAA).
• Using Standard Genetic Code table (Wikipedia or any other sources), create hash table for all amino acids ($amino{TTT} => "F", ...).
• Write found protein sequences into file with the next format:
Position of 1st start codon: protein sequence [length of protein sequence]
Position of 2nd start codon: ….
For example: 45: FLPQWCV [7]
I am sort of clueless and not sure where to start. To find the coding sequence I have generated a 5000bp random nucleotide but everytime i use the code below to find a coding region it returns nothing. Can anyone tell me what i am doing wrong?
@nucs=("A","C","G","T"); $size=5000; for ($i=0; $i<$size; $i++) { $seqR .= $nucs[int(rand(4))]; } print "Seq($size): $seqR\n"; if (/ATG([ACGT][ACGT][ACGT]){3,5000}(TAA|TAG|TGA)/) { print "This seq. might contain a coding region\n" } else{ print "This sequence most liklely does not contin a coding region\ +n" }
In reply to Bioinformatics coding question by charm
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |