There's a bug... an off by one error on the last iteration: it should read length( $consensus ) - 1;
Actually they could be reduced to a single loop like this:
use strict;
use warnings;
my $consensus = 'abc*def*ghi';
my( @index_c, @index_n );
my $idx = 0;
my $len = length( $consensus );
push( @{ substr( $consensus, $idx, 1 ) eq '*' ? \@index_c : \@index_n
+}, $idx++ )
while $idx < $len;
print "C: @index_c\n";
print "N: @index_n\n";
This trades three loops (two explicit -- grep, and one implicit -- the initialization of @indexes) for a single while loop. It also reduces the calls to substr in half.
Explaining the original (grep) method: Create an array that contains the indices of each position within the string $consensus. Then iterate over the array checking the character in $consensus at each position for the existence of the character '*'. If the condition is true ('*' exists in that position), place that index position in @index_c. Repeat the process testing for non-existence of '*', placing all indices where '*' doesn't exist into @index_n.
Explaining my refactor: Create some empty index holders (@index_c, @index_n). Create a counter variable and a variable that holds the length of $consensus ( $idx, $len ). Then we take the length of $consensus just one time and store it (rather than call length in the while loop conditional). Next iterate over the $idx values from 0 through the last index position of $consensus ($idx++ < $len). At each position use substr to access and test whether the character at position $idx contains a '*'. The ternary operator exposes either a reference to @index_c or a reference to @index_n to the @{...} dereference construct, creating an lvalue. We then push $idx (the current position) onto either @index_c or @index_n depending on the outcome of the conditional test. Repeat this process for each character position from 0 through $len-1.
That changes the algorithm efficiency from three O(n) loops to one O(n) loop. However, it does sacrifice coding clarity, which could be restored like this:
my $consensus = 'abc*def*ghi';
my( @index_c, @index_n );
my $idx = 0;
my $len = length( $consensus );
while( $idx < $len ) {
if( substr( $consensus, $idx, 1 ) eq '*' ) {
push @index_c, $idx;
}
else {
push @index_n, $idx;
}
$idx++;
}
print "C: @index_c\n";
print "N: @index_n\n";
|