Hello ic23oluk,

I do not have that much time to modify and experiment with your code but a few minor modifications can make your script faster:

#!usr/bin/perl use strict; use warnings; my @starts = (); while (<>) { chomp; next if $. < 2; # Skip first line my @seq = split '', $_; for (0..$#seq-5){ if ($seq[$_] eq 'A' && $seq[$_+1] eq 'T' && $seq[$_+2] eq 'G') { push (@starts, $_); for (my $j=$_+3; $j<=$#seq-2; $j=$j+3){ if ( ($seq[$j] eq 'T' && $seq[$j+1] eq 'A' && $seq[$j+2] eq 'A +') || ($seq[$j] eq 'T' && $seq[$j+1] eq 'G' && $seq[$j+2] eq 'A +') || ($seq[$j] eq 'T' && $seq[$j+1] eq 'A' && $seq[$j+2] eq 'G +') ) { print "ORF: $_-", ($j+2), "\n"; last; ##lasts the j loop } } } } } continue { close ARGV if eof; # reset $. } __END__ $ perl bio_test.pl sequence.fa ORF: 16-75 ORF: 50-136 ORF: 133-357 ORF: 232-357 ORF: 252-290 ORF: 287-334 ORF: 305-334 ORF: 363-380 ORF: 394-450 ORF: 489-623 ORF: 575-724 ORF: 651-797 ORF: 689-724 ORF: 724-936 ORF: 854-859 ORF: 859-936 ORF: 954-959 ORF: 1051-1200 ORF: 1145-1255 ORF: 1228-1275 ORF: 1249-1275

Update: Another possible way, this is how I would approach it. This is 3/4 of the solution, I do not want to give you the full solution as this looks like school work. It should not be really difficult to continue from here.

As I said before use index for multiple occurrences in the string.

#!usr/bin/perl use strict; use warnings; use Data::Dumper; my $substring = 'ATG'; while (<>) { chomp; next if $. < 2; # Skip first line my @seq = split ' ', $_; # print Dumper \@seq; # Remove hashtag and see how the input looks for (@seq) { my $found = index($_, $substring); while ($found != -1) { # check for multiple occurrences in the sam +e string print "Found $substring at $found\n"; # Do your calculations h +ere my $offset = $found + 1; # keep track of offset in your counte +r as you do on your primary script and should have it $found = index($_, $substring, $offset); } } } continue { close ARGV if eof; # reset $. } __END__ $ perl bio_test.pl sequence.fa Found 'ATG' at 16 Found 'ATG' at 50 Found 'ATG' at 62 Found 'ATG' at 19 Found 'ATG' at 39 Found 'ATG' at 3 Found 'ATG' at 21 Found 'ATG' at 8 Found 'ATG' at 39 Found 'ATG' at 63 Found 'ATG' at 7 Found 'ATG' at 12 Found 'ATG' at 50 Found 'ATG' at 14 Found 'ATG' at 2 Found 'ATG' at 7 Found 'ATG' at 31 Found 'ATG' at 20 Found 'ATG' at 50 Found 'ATG' at 57 Found 'ATG' at 9 Found 'ATG' at 21 Found 'ATG' at 42 Found 'ATG' at 65

Notice that the index is going back to zero, keep track of it and it should give you the same results as your primary script. Let us know if there is something that is not clear.

Update2: If the space in between the sequences in normal you do not need to split and you can loop through like this:

#!usr/bin/perl use strict; use warnings; my $substring = 'ATG'; while (<>) { chomp; next if $. < 2; # Skip first line my $found = index($_, $substring); while ($found != -1) { # check for multiple occurrences in the sam +e string print "Found $substring at $found\n"; # Do your calculations here my $offset = $found + 1; # keep track of offset in your counter as + you do on your primary script and should have it $found = index($_, $substring, $offset); } } continue { close ARGV if eof; # reset $. } __END__ $ perl bio.pl sequence.fa Found ATG at 16 Found ATG at 50 Found ATG at 133 Found ATG at 232 Found ATG at 252 Found ATG at 287 Found ATG at 305 Found ATG at 363 Found ATG at 394 Found ATG at 489 Found ATG at 575 Found ATG at 651 Found ATG at 689 Found ATG at 724 Found ATG at 854 Found ATG at 859 Found ATG at 954 Found ATG at 1014 Found ATG at 1044 Found ATG at 1051 Found ATG at 1145 Found ATG at 1228 Found ATG at 1249 Found ATG at 1272

Now you just need to apply an if condition and voila. :D

Hope this helps.

Seeking for Perl wisdom...on the process of learning...not there...yet!

In reply to Re^3: finding open reading frames by thanos1983
in thread finding open reading frames by ic23oluk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.