in reply to finding open reading frames
Hello again ic23oluk,
I spend some time and I got even more confused. I got different results from your scrip:
#!usr/bin/perl use strict; use warnings; my %hash; my $substring = 'ATG'; my @tags = ('TAG', 'TAA', 'TGA'); while (<>) { chomp; next if $. < 2; # Skip first line my $found = index($_, $substring); while ($found != -1) { foreach my $tag (@tags) { my $position = index($_, $tag, $found + 2); $hash{$position} = $tag if ($position != -1); } my $offset = $found + 1; $found = index($_, $substring, $offset); } } continue { close ARGV if eof; # reset $. } foreach my $position (sort {$a <=> $b} keys %hash) { print "Position: $position String: $hash{$position}\n"; } __END__ $ perl bio.pl sequence.fa Position: 29 String: TAA Position: 48 String: TGA Position: 73 String: TAA Position: 134 String: TGA Position: 201 String: TAA Position: 221 String: TGA Position: 248 String: TAA Position: 288 String: TGA Position: 309 String: TAG Position: 324 String: TGA Position: 332 String: TAA Position: 378 String: TAA Position: 381 String: TAG Position: 395 String: TGA Position: 408 String: TGA Position: 448 String: TAA Position: 505 String: TGA Position: 512 String: TAA Position: 621 String: TGA Position: 700 String: TGA Position: 722 String: TAA Position: 743 String: TAA Position: 752 String: TGA Position: 857 String: TAA Position: 864 String: TAA Position: 957 String: TAG Position: 974 String: TAA Position: 1002 String: TGA Position: 1019 String: TAA Position: 1052 String: TGA Position: 1198 String: TGA Position: 1210 String: TAG Position: 1253 String: TGA Position: 1265 String: TAA
At this point I do not know if your script is giving you the correct output. I am not sure if mine also works perfectly but I tried to confirm that by using grep and the tags e.g. TAA to check if I should get an output as my script does and yours skips.
For example:
$ cat sequence.fa | grep TAA GAGTTTTATCGCTTCCATGACGCAGAAGTTAACACTTTCGGATATTTCTGATGAGTCGAAAAATTATCTT + GATAAAGCAGGAATTACTACTGCTTGTTTACGAATTAAATCGAAGTGGACTGCTGGCGGAAAATGAGA +AA ATTCGACCTATCCTTGCGCAGCTCGAGAAGCTCTTACTTTGCGACCTTTCGCCATCAACTAACGAT +TCTG TCAAAAACTGACGCGTTGGATGAGGAGAAGTGGCTTAATATGCTTGGCACGTTCGTCAAGGACT +GGTTTA GATATGAGTCACATTTTGTTCATGGTAGAGATTCTCTTGTTGACATTTTAAAAGAGCGTGGA +TTACTATC TGAGTCCGATGCTGTTCAACCACTAATAGGTAAGAAATCATGAGTCAAGTTACTGAACAA +TCCGTACGTT TCCAGACCGCTTTGGCCTCTATTAAGCTCATTCAGGCTTCTGCCGTTTTGGATTTAAC +CGAAGATGATTT CGATTTTCTGACGAGTAACAAAGTTTGGATTGCTACTGACCGCTCTCGTGCTCGTC +GCTGCGTTGAGGCT TGCGTTTATGGTACGCTGGACTTTGTGGGATACCCTCGCTTTCCTGCTCCTGTT +GAGTTTATTGCTGCCG TCATTGCTTATTATGTTCATCCCGTCAACATTCAAACGGCCTGTCTCATCAT +GGAAGGCGCTGAATTTAC GGAAAACATTATTAATGGCGTCGAGCGTCCGGTTAAAGCCGCTGAATTGT +TCGCGTTTACCTTGCGTGTA CGCGCAGGAAACACTGACGTTCTTACTGACGCAGAAGAAAACGTGCGT +CAAAAATTACGTGCGGAAGGAG TGATGTAATGTCTAAAGGTAAAAAACGTTCTGGCGCTCGCCCTGGT +CGTCCGCAGCCGTTGCGAGGTACT AAAGGCAAGCGTAAAGGCGCTCGTCTTTGGTATGTAGGTGGTCA +ACAATTTTAATTGCAGGGGCTTCGGC CCCTTACTTGAGGATAAATTATGTCTAATATTCAAACTGGCG +CCGAGCGTATGCCGCATGACCTTTCCCA TCTTGGCTTCCTTGCTGGTCAGATTGGTCGTCTTATTACC +ATTTCAACTACTCCGGTTATCGCTGGCGAC TCCTTCGAGATGGACGCCGTTGGCGCTCTCCGTCTTTC +TCCATTGCGTCGTGGCCTTGCTATTGACTCTA CTGTAGACATTTTTACTTTTTATGTCCCTCATCGTC +ACGTTTATGGTGAACAGTGGATTAAGTTCATGAA $ cat sequence.fa | grep -o TAA | wc -l 21
Based on that I can tell that I am getting confident that your script is not calculating something correctly.
Update: I took a look again on the bash search (remember to remove first line from file!):
$ cat sequence.fa | grep -b -o TAA 29:TAA 73:TAA 106:TAA 201:TAA 248:TAA 332:TAA 378:TAA 385:TAA 448:TAA 480:TAA 512:TAA 722:TAA 743:TAA 857:TAA 864:TAA 870:TAA 934:TAA 974:TAA 1008:TAA 1019:TAA 1265:TAA
The output of my script:
#!usr/bin/perl use strict; use warnings; my $substring = 'TAA'; while (<>) { chomp; next if $. < 2; # Skip first line my $found = index($_, $substring); while ($found != -1) { # check for multiple occurrences in the sam +e string print "Found $substring at $found\n"; my $offset = $found + 1; $found = index($_, $substring, $offset); } } continue { close ARGV if eof; # reset $. } __END__ $ perl bio_2.pl sequence.fa Found TAA at 29 Found TAA at 73 Found TAA at 106 Found TAA at 201 Found TAA at 248 Found TAA at 332 Found TAA at 378 Found TAA at 385 Found TAA at 448 Found TAA at 480 Found TAA at 512 Found TAA at 722 Found TAA at 743 Found TAA at 857 Found TAA at 864 Found TAA at 870 Found TAA at 934 Found TAA at 974 Found TAA at 1008 Found TAA at 1019 Found TAA at 1265
Update2: If you check also the ATG
$ perl bio_2.pl sequence.fa Found ATG at 16 Found ATG at 50 Found ATG at 133 Found ATG at 232 Found ATG at 252 Found ATG at 287 Found ATG at 305 Found ATG at 363 Found ATG at 394 Found ATG at 489 Found ATG at 575 Found ATG at 651 Found ATG at 689 Found ATG at 724 Found ATG at 854 Found ATG at 859 Found ATG at 954 Found ATG at 1014 Found ATG at 1044 Found ATG at 1051 Found ATG at 1145 Found ATG at 1228 Found ATG at 1249 Found ATG at 1272 tinyos@tinyOS:~/Monks$ cat sequence.fa | grep -b -o ATG 16:ATG 50:ATG 133:ATG 232:ATG 252:ATG 287:ATG 305:ATG 363:ATG 394:ATG 489:ATG 575:ATG 651:ATG 689:ATG 724:ATG 854:ATG 859:ATG 954:ATG 1014:ATG 1044:ATG 1051:ATG 1145:ATG 1228:ATG 1249:ATG 1272:ATG
Now check one by one the index and the final output of my script:
$ perl bio.pl sequence.fa Position: 29 String: TAA Position: 48 String: TGA Position: 73 String: TAA Position: 134 String: TGA Position: 201 String: TAA Position: 221 String: TGA Position: 248 String: TAA Position: 288 String: TGA Position: 309 String: TAG Position: 324 String: TGA Position: 332 String: TAA Position: 378 String: TAA Position: 381 String: TAG Position: 395 String: TGA Position: 408 String: TGA Position: 448 String: TAA Position: 505 String: TGA Position: 512 String: TAA Position: 621 String: TGA Position: 700 String: TGA Position: 722 String: TAA Position: 743 String: TAA Position: 752 String: TGA Position: 857 String: TAA Position: 864 String: TAA Position: 957 String: TAG Position: 974 String: TAA Position: 1002 String: TGA Position: 1019 String: TAA Position: 1052 String: TGA Position: 1198 String: TGA Position: 1210 String: TAG Position: 1253 String: TGA Position: 1265 String: TAA
Observe that the script is searching after the ATG as you requested (check indexes). It looks correct to me.
Take a look and give it a try to verify it or not.
Hope this helps instead of making things more confusing.
|
---|