in reply to Re^4: finding open reading frames
in thread finding open reading frames

Hello Anonymous Monk

Maybe you are right let's Benchmark them and see:

I know that the while loop exceeds the foreach loop but in my case I did not made many differences but I was aiming to remove unnecessary lines.

#!usr/bin/perl use strict; use warnings; use Benchmark::Forking qw( timethese cmpthese ); # UnixOS # use Benchmark qw(:all) ; # WindowsOS my @starts; sub previous { open (FASTA, "sequence.fa") || die "Cannot open file: $!.\n"; chomp (my @seq = <FASTA>); close FASTA; shift @seq; my $sequence = join ('', @seq); @seq = split ('', $sequence); for (my $i=0; $i<=$#seq-5; $i++){ ## -5 könnte man weglassen # start codon: ATG # stopp codon: TAA, TGA, TAG # multiple of 3 between start and stop if ($seq[$i] eq 'A' && $seq[$i+1] eq 'T' && $seq[$i+2] eq 'G') { push (@starts, $i); for (my $j=$i+3; $j<=$#seq-2; $j=$j+3){ if ( ($seq[$j] eq 'T' && $seq[$j+1] eq 'A' && $seq[$j+2] eq 'A +') || ($seq[$j] eq 'T' && $seq[$j+1] eq 'G' && $seq[$j+2] eq 'A +') || ($seq[$j] eq 'T' && $seq[$j+1] eq 'A' && $seq[$j+2] eq 'G +') ) { # print "ORF: $i-", ($j+2), "\n"; last; ##lasts the j loop } } } } return; } sub update { open my $fh, "sequence.fa" or die "Could not open file: $!"; while (defined( $_ = <$fh>)) { chomp; next if $. < 2; # Skip first line my @seq = split '', $_; for (0..$#seq-5){ if ($seq[$_] eq 'A' && $seq[$_+1] eq 'T' && $seq[$_+2] eq 'G') + { push (@starts, $_); for (my $j=$_+3; $j<=$#seq-2; $j=$j+3){ if ( ($seq[$j] eq 'T' && $seq[$j+1] eq 'A' && $seq[$j+2] e +q 'A') || ($seq[$j] eq 'T' && $seq[$j+1] eq 'G' && $seq[$j+2] eq 'A +') || ($seq[$j] eq 'T' && $seq[$j+1] eq 'A' && $seq[$j+2] eq 'G +') ) { # print "ORF: $_-", ($j+2), "\n"; last; ##lasts the j loop } } } } } continue { # close ARGV if eof; # reset $. } close $fh or die "Could not close file: $!"; return; } my $results = timethese(1000000, { Previous => \&previous, Updated => \&update }, 'none'); cmpthese( $results ); __END__ $ perl bio_test.pl Rate Previous Updated Previous 2224/s -- -14% Updated 2601/s 17% --

See also my update proposed solution, the second update should resolve the question and should be also faster.

Seeking for Perl wisdom...on the process of learning...not there...yet!

Replies are listed 'Best First'.
Re^6: finding open reading frames
by Anonymous Monk on Jun 06, 2017 at 17:46 UTC
    The O(n**2) nested-loop performance is going to kill you on some datasets. For a really pathological one, try:
    $sequence = 'ATG' x 1e6;
    I estimate that your code would take about a day and a half to process that. My code handles it in just over a second.

    The human genome is around 3e9 base-pairs long. That's small enough to fit it all in memory, but large enough that you need to use efficient algorithms on it.