Re^2: Per residue sequence alignment - per character string comparison?

For repeated searches, building a lookup table can speed the program up, but you should really Benchmark instead of guessing.

Here's what I had tried before you published your code, it counts the dashes, so it adjusts the positions in larger chunks instead of by one:

#!/usr/bin/perl
use warnings;
use strict;
use Syntax::Construct qw{ // };


sub position {
    my ($seq, $query) = @_;
    my $pos = my $sum = $query->[1] - $seq->{from};
    my $start = 0;
    my $changed = 0;

    while (my $count = substr($seq->{string}, $start, $pos + 1) =~ tr/
+-//) {
        ++$changed;
        $start = $sum;
        $pos = $count - 1;
        $sum += $count;
    }
    --$sum if $changed > 1;
    my $expected = substr $seq->{string}, $sum, 1;
    return $sum, $expected
}


sub find {
    my ($seq, $idx) = @_;
    my $char;
    my $pos;
    if('-' ne ( $char = substr $seq->{string}, $idx, 1 )) {
        $pos = $seq->{from} + $idx;
        $pos -= substr($seq->{string}, 0, $idx) =~ tr/-//;
    }
    return $char, $pos
}


my %seq_a = ( from   => 36,
              to     => 190,
              string => 'LTIEAVPSNAAEGKEVLLLVHNLPQDPRGYNWYKGETVDANRRIJ
+GYVISNQQITPGPAYSNRETIYPNASLXMRNVTRNDTGSYTLQVIKLNLMSEEVTGQ-FSVHPETPKPS
+ISSNNSNPVEDKDAVAFTCEPETQNTTYLWWVNGQSLPVSP' );

my %seq_b = ( from   => 206,
              to     => 334,
              string => 'PTISPSYTYYRPGVNLSLSCHAASNPPAQYSWLIDGNIQQHTQE-
+--------------------------LFISNITEKNSGLYTCQANNSASGHSRTTVKTIYVSAELPKPS
+ISSNNSKPVEDKDAVAFTCEPEAQNTTYLWWVNGQSLPVSP' );


use Test::More;

my %tab;
for my $pos ($seq_b{from} .. $seq_b{to}) {
    my ($idx, $char) = position(\%seq_b, [ q() => $pos ]);
    $tab{"$char$pos"} = join q(), map $_ // 'undef', find(\%seq_a, $id
+x);
}


sub assert {
    is $tab{"$_[0]$_[1]"}, "$_[2]$_[3]", "$_[0]$_[1]";
}


assert(P => 206, L => 36);

assert(E => 249, I => 79);
assert(L => 250, L => 107);
assert(F => 251, X => 108);
assert(I => 252, M => 109);
assert(S => 253, R => 110);
assert(N => 254, N => 111);
assert(I => 255, V => 112);
assert(E => 257, R => 114);

assert(A => 271, N => 128);
assert(S => 272, L => 129);
assert(G => 273, M => 130);
assert(H => 274, S => 131);
assert(S => 275, E => 132);
assert(R => 276, E => 133);
assert(T => 277, V => 134);
assert(T => 278, T => 135);

assert(V => 279, G => 136);
assert(K => 280, Q => 137);
assert(T => 281, '-' => 'undef');
assert(I => 282, F => 138);

done_testing();
[download]

I've changed the sequnces a bit to detect off-by-one errors more easily.

($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord
}map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
[download]

Comment on Re^2: Per residue sequence alignment - per character string comparison? Select or Download Code

Replies are listed 'Best First'.
Re^3: Per residue sequence alignment - per character string comparison? by proteins (Novice) on Aug 10, 2016 at 13:27 UTC
Thank you for the comprehensive reply. Some of this code is beyond my current Perl comprehension, but I'll work through it and let you know if I have any questions. I am currently writing the code to read in the alignments and residues to align from a large text file. I'll run Benchmark on a subset of the data once that is working.	[reply]