Now that I understand the question, here's another attempt. :)
sub align { my($hash, $seq) = @_; my $best = { totlen => 0 }; my @match; my $re = join '(.*?)', ($seq =~ /(\w)/g); for my $key (sort keys %$hash) { my $string = join '', @{ $hash->{$key} }; next unless $string =~ $re; my $offset = $-[0]; my $length = $+[0] - $offset; my $match = { key => $key, matched => substr($string, $offset, $length), totlen => $length, inserts => [ map $_ - $offset, @+[1 .. $#-] ], lengths => [ map $+[$_] - $-[$_], 1 .. $#+ ], }; push @match, $match; $best = $match if $match->{totlen} > $best->{totlen}; } return [ map { my $match = $_; my $string = $match->{matched}; for (reverse 0 .. $#{ $match->{lengths} }) { my $bestlen = $best->{lengths}[$_]; my $curlen = $match->{lengths}[$_]; if ($bestlen > $curlen) { substr($string, $match->{inserts}[$_], 0) = "-" x ($bestlen - +$curlen); } } $string; } @match ]; }
The idea is to apply the regexp to each string only once, and capture all relevant information at that point (also recording the longest match along the way), then take a second pass through all the matched substrings to add the hyphenation.
It isn't clear from the example what should happen if one of the strings to be modified already has a gap longer than that required, eg testing "ACD" against "ABICID" and "ACHHDH" - the code above would leave such a gap alone yielding
, but if it should instead be truncated or marked in some other way you'd need to add an ... elsif ($bestlen < $curlen) ... chunk near the bottom.[ "ABICID", "A--CHHD" ]
Hugo
In reply to Re: Simple String Alignment
by hv
in thread Simple String Alignment
by monkfan
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |