Hello ic23oluk,

Ok this is the final and correct solution all the rest of my solutions are wrong.

#!usr/bin/perl use strict; use warnings; use Data::Dumper; my %HoH; my @AoH; my $substring = 'ATG'; sub get_first_index { my %hash; my ($found, $string) = @_; my @tags = ('TAG', 'TAA', 'TGA'); my @indexes; foreach my $tag (@tags) { my $position = index($string, $tag, $found + 2); push @indexes, $position if ($position != -1); } # hash slice @hash{@indexes} = @tags; # sort has based on the lowest key first my @sorted = (sort {$a <=> $b} keys %hash); # remove the rest of the keys as we only want first occurence my $array_size = @sorted; delete $hash{$_} for @sorted [1..$array_size - 1]; return \%hash; } while (<>) { chomp; next if $. < 2; # Skip first line my $found = index($_, $substring); while ($found != -1) { my $hash_result = get_first_index( $found, $_ ); # choose one or the other what ever you prefer $HoH{"Found $substring at $found"} = $hash_result if (%$hash_resul +t); push @AoH, "Found $substring at $found" ,$hash_result if (%$hash_r +esult); my $offset = $found + 1; $found = index( $_, $substring, $offset ); } } continue { close ARGV if eof; # reset $. } my @keys = keys %HoH; print scalar @keys . "\n"; # print Dumper \@AoH; # print Dumper \%HoH; __END__ $ perl bio.pl sequence.fa 23

Why this is the correct one? Because:

$ cat sequence.fa | grep -bo ATG 16:ATG 50:ATG 133:ATG 232:ATG 252:ATG 287:ATG 305:ATG 363:ATG 394:ATG 489:ATG 575:ATG 651:ATG 689:ATG 724:ATG 854:ATG 859:ATG 954:ATG 1014:ATG 1044:ATG 1051:ATG 1145:ATG 1228:ATG 1249:ATG 1272:ATG tinyos@tinyOMN:~/Monks$ cat sequence.fa | grep -bo ATG | wc -l 24

As you can see from the sample above, we have 24 matches with the key word ATG. But why we get 23 results from the script above? Simply because there is nothing matching from the data file after the last ATG.

Update2: Since you said you want maximum speed you can replace the foreach loop with a while loop.

#!usr/bin/perl use say; use strict; use warnings; use Data::Dumper; my %HoH; my @AoH; my $substring = 'ATG'; sub get_first_index { my %hash; my ($found, $string) = @_; my @tags = ('TAG', 'TAA', 'TGA'); my @indexes; while (my $tag = shift @tags) { my $position = index($string, $tag, $found + 2); push @indexes, $position if ($position != -1); } # hash slice, we destroy the array above so we need to replace it @hash{@indexes} = ('TAG', 'TAA', 'TGA'); # sort has based on the lowest key first my @sorted = (sort {$a <=> $b} keys %hash); # remove the rest of the keys as we only want first occurence my $array_size = @sorted; delete $hash{$_} for @sorted [1..$array_size - 1]; return \%hash; } while (<>) { chomp; next if $. < 2; # Skip first line my $found = index($_, $substring); while ($found != -1) { my $hash_result = get_first_index( $found, $_ ); # choose one or the other what ever you prefer $HoH{"Found $substring at $found"} = $hash_result if (%$hash_resul +t); push @AoH, "Found $substring at $found" ,$hash_result if (%$hash_r +esult); my $offset = $found + 1; $found = index( $_, $substring, $offset ); } } continue { close ARGV if eof; # reset $. } my @keys = keys %HoH; print scalar @keys . "\n"; # print Dumper \@AoH; # print Dumper \%HoH;

Hope this helps and that you are still following your own question :D

Seeking for Perl wisdom...on the process of learning...not there...yet!

In reply to Re: finding open reading frames by thanos1983
in thread finding open reading frames by ic23oluk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.