comment on

Ok this is the final and correct solution all the rest of my solutions are wrong.

#!usr/bin/perl
use strict;
use warnings;
use Data::Dumper;

my %HoH;
my @AoH;
my $substring = 'ATG';

sub get_first_index {
    my %hash;
    my ($found, $string) = @_;
    my @tags = ('TAG', 'TAA', 'TGA');

    my @indexes;
    foreach my $tag (@tags) {
    my $position = index($string, $tag, $found + 2);
    push @indexes, $position if ($position != -1);
    }

    # hash slice
    @hash{@indexes} = @tags;

    # sort has based on the lowest key first
    my @sorted = (sort {$a <=> $b} keys %hash);

    # remove the rest of the keys as we only want first occurence
    my $array_size = @sorted;
    delete $hash{$_} for @sorted [1..$array_size - 1];
    return \%hash;
}

while (<>) {
    chomp;
    next if $. < 2; # Skip first line
    my $found = index($_, $substring);
    while ($found != -1) {
    my $hash_result = get_first_index( $found, $_ );
    # choose one or the other what ever you prefer
    $HoH{"Found $substring at $found"} = $hash_result if (%$hash_resul
+t);
    push @AoH, "Found $substring at $found" ,$hash_result if (%$hash_r
+esult);
    my $offset = $found + 1;
    $found = index( $_, $substring, $offset );
    }
} continue {
    close ARGV if eof; # reset $.
}

my @keys = keys %HoH;

print scalar @keys . "\n";

# print Dumper \@AoH;
# print Dumper \%HoH;

__END__

$ perl bio.pl sequence.fa
23
[download]

Why this is the correct one? Because:

$ cat sequence.fa | grep -bo ATG
16:ATG
50:ATG
133:ATG
232:ATG
252:ATG
287:ATG
305:ATG
363:ATG
394:ATG
489:ATG
575:ATG
651:ATG
689:ATG
724:ATG
854:ATG
859:ATG
954:ATG
1014:ATG
1044:ATG
1051:ATG
1145:ATG
1228:ATG
1249:ATG
1272:ATG
tinyos@tinyOMN:~/Monks$ cat sequence.fa | grep -bo ATG | wc -l
24
[download]

As you can see from the sample above, we have 24 matches with the key word ATG. But why we get 23 results from the script above? Simply because there is nothing matching from the data file after the last ATG.

Update2: Since you said you want maximum speed you can replace the foreach loop with a while loop.

#!usr/bin/perl
use say;
use strict;
use warnings;
use Data::Dumper;

my %HoH;
my @AoH;
my $substring = 'ATG';

sub get_first_index {
    my %hash;
    my ($found, $string) = @_;
    my @tags = ('TAG', 'TAA', 'TGA');

    my @indexes;
    while (my $tag = shift @tags) {
    my $position = index($string, $tag, $found + 2);
    push @indexes, $position if ($position != -1);
    }

    # hash slice, we destroy the array above so we need to replace it
    @hash{@indexes} = ('TAG', 'TAA', 'TGA');

    # sort has based on the lowest key first
    my @sorted = (sort {$a <=> $b} keys %hash);

    # remove the rest of the keys as we only want first occurence
    my $array_size = @sorted;
    delete $hash{$_} for @sorted [1..$array_size - 1];
    return \%hash;
}

while (<>) {
    chomp;
    next if $. < 2; # Skip first line
    my $found = index($_, $substring);
    while ($found != -1) {
    my $hash_result = get_first_index( $found, $_ );
    # choose one or the other what ever you prefer
    $HoH{"Found $substring at $found"} = $hash_result if (%$hash_resul
+t);
    push @AoH, "Found $substring at $found" ,$hash_result if (%$hash_r
+esult);
    my $offset = $found + 1;
    $found = index( $_, $substring, $offset );
    }
} continue {
    close ARGV if eof; # reset $.
}

my @keys = keys %HoH;

print scalar @keys . "\n";

# print Dumper \@AoH;
# print Dumper \%HoH;
[download]

Hope this helps and that you are still following your own question :D

Seeking for Perl wisdom...on the process of learning...not there...yet!

In reply to Re: finding open reading frames by thanos1983
in thread finding open reading frames by ic23oluk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.