comment on

For repeated searches, building a lookup table can speed the program up, but you should really Benchmark instead of guessing.

Here's what I had tried before you published your code, it counts the dashes, so it adjusts the positions in larger chunks instead of by one:

#!/usr/bin/perl
use warnings;
use strict;
use Syntax::Construct qw{ // };


sub position {
    my ($seq, $query) = @_;
    my $pos = my $sum = $query->[1] - $seq->{from};
    my $start = 0;
    my $changed = 0;

    while (my $count = substr($seq->{string}, $start, $pos + 1) =~ tr/
+-//) {
        ++$changed;
        $start = $sum;
        $pos = $count - 1;
        $sum += $count;
    }
    --$sum if $changed > 1;
    my $expected = substr $seq->{string}, $sum, 1;
    return $sum, $expected
}


sub find {
    my ($seq, $idx) = @_;
    my $char;
    my $pos;
    if('-' ne ( $char = substr $seq->{string}, $idx, 1 )) {
        $pos = $seq->{from} + $idx;
        $pos -= substr($seq->{string}, 0, $idx) =~ tr/-//;
    }
    return $char, $pos
}


my %seq_a = ( from   => 36,
              to     => 190,
              string => 'LTIEAVPSNAAEGKEVLLLVHNLPQDPRGYNWYKGETVDANRRIJ
+GYVISNQQITPGPAYSNRETIYPNASLXMRNVTRNDTGSYTLQVIKLNLMSEEVTGQ-FSVHPETPKPS
+ISSNNSNPVEDKDAVAFTCEPETQNTTYLWWVNGQSLPVSP' );

my %seq_b = ( from   => 206,
              to     => 334,
              string => 'PTISPSYTYYRPGVNLSLSCHAASNPPAQYSWLIDGNIQQHTQE-
+--------------------------LFISNITEKNSGLYTCQANNSASGHSRTTVKTIYVSAELPKPS
+ISSNNSKPVEDKDAVAFTCEPEAQNTTYLWWVNGQSLPVSP' );


use Test::More;

my %tab;
for my $pos ($seq_b{from} .. $seq_b{to}) {
    my ($idx, $char) = position(\%seq_b, [ q() => $pos ]);
    $tab{"$char$pos"} = join q(), map $_ // 'undef', find(\%seq_a, $id
+x);
}


sub assert {
    is $tab{"$_[0]$_[1]"}, "$_[2]$_[3]", "$_[0]$_[1]";
}


assert(P => 206, L => 36);

assert(E => 249, I => 79);
assert(L => 250, L => 107);
assert(F => 251, X => 108);
assert(I => 252, M => 109);
assert(S => 253, R => 110);
assert(N => 254, N => 111);
assert(I => 255, V => 112);
assert(E => 257, R => 114);

assert(A => 271, N => 128);
assert(S => 272, L => 129);
assert(G => 273, M => 130);
assert(H => 274, S => 131);
assert(S => 275, E => 132);
assert(R => 276, E => 133);
assert(T => 277, V => 134);
assert(T => 278, T => 135);

assert(V => 279, G => 136);
assert(K => 280, Q => 137);
assert(T => 281, '-' => 'undef');
assert(I => 282, F => 138);

done_testing();
[download]

I've changed the sequnces a bit to detect off-by-one errors more easily.

($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord
}map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
[download]

In reply to Re^2: Per residue sequence alignment - per character string comparison? by choroba
in thread Per residue sequence alignment - per character string comparison? by proteins

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.