There's a fair chunk of unconventional and sloppy code in there which I haven't time to comment on blow by blow, so instead here's the first chunk of the code cleaned up somewhat:
use strict;
use warnings;
my $start_time = time;
my ($input1, $input2) = @ARGV;
open my $in, '<', $input1 or die "Can't read source file $input1 : $!\
+n";
my @lengths = grep{! m/\>/} <$in>;
close $in;
chomp @lengths;
open $in, '<', $input2 or die "Can't read source file $input2 : $!\n";
my @source = <$in>;
close $in;
chomp @source;
#********************#
# CALCULATE LENGTH DISTRIBUTION FROM INPUT FILE #1
#********************#
my @sorted = sort {$a <=> $b} @lengths;
my %seen;
my @uniques = grep {!$seen{$_}++} @sorted;
# hash of predicted sORF length (key) and number of times (value) that
+ size is
# observed in the multifasta input file #1
my %dstrbtn_hash;
for my $len (@uniques) {
dstrbtn_hash{$len} = grep{$len == $_} @sorted;
}
which probably doesn't solve the problem, but maybe points you in the direction of better technique.
I suspect the real issue is in the EXTRACT and START "loops". I suspect that depending on input data those loops could spend an indeterminately long time not achieving much. A small sample of your input data would help understand what's supposed to be going on there and find a more deterministic way of calculating the values you need.
Perl is the programming world's equivalent of English