in reply to Re: Per residue sequence alignment - per character string comparison?
in thread Per residue sequence alignment - per character string comparison?

For repeated searches, building a lookup table can speed the program up, but you should really Benchmark instead of guessing.

Here's what I had tried before you published your code, it counts the dashes, so it adjusts the positions in larger chunks instead of by one:

#!/usr/bin/perl use warnings; use strict; use Syntax::Construct qw{ // }; sub position { my ($seq, $query) = @_; my $pos = my $sum = $query->[1] - $seq->{from}; my $start = 0; my $changed = 0; while (my $count = substr($seq->{string}, $start, $pos + 1) =~ tr/ +-//) { ++$changed; $start = $sum; $pos = $count - 1; $sum += $count; } --$sum if $changed > 1; my $expected = substr $seq->{string}, $sum, 1; return $sum, $expected } sub find { my ($seq, $idx) = @_; my $char; my $pos; if('-' ne ( $char = substr $seq->{string}, $idx, 1 )) { $pos = $seq->{from} + $idx; $pos -= substr($seq->{string}, 0, $idx) =~ tr/-//; } return $char, $pos } my %seq_a = ( from => 36, to => 190, string => 'LTIEAVPSNAAEGKEVLLLVHNLPQDPRGYNWYKGETVDANRRIJ +GYVISNQQITPGPAYSNRETIYPNASLXMRNVTRNDTGSYTLQVIKLNLMSEEVTGQ-FSVHPETPKPS +ISSNNSNPVEDKDAVAFTCEPETQNTTYLWWVNGQSLPVSP' ); my %seq_b = ( from => 206, to => 334, string => 'PTISPSYTYYRPGVNLSLSCHAASNPPAQYSWLIDGNIQQHTQE- +--------------------------LFISNITEKNSGLYTCQANNSASGHSRTTVKTIYVSAELPKPS +ISSNNSKPVEDKDAVAFTCEPEAQNTTYLWWVNGQSLPVSP' ); use Test::More; my %tab; for my $pos ($seq_b{from} .. $seq_b{to}) { my ($idx, $char) = position(\%seq_b, [ q() => $pos ]); $tab{"$char$pos"} = join q(), map $_ // 'undef', find(\%seq_a, $id +x); } sub assert { is $tab{"$_[0]$_[1]"}, "$_[2]$_[3]", "$_[0]$_[1]"; } assert(P => 206, L => 36); assert(E => 249, I => 79); assert(L => 250, L => 107); assert(F => 251, X => 108); assert(I => 252, M => 109); assert(S => 253, R => 110); assert(N => 254, N => 111); assert(I => 255, V => 112); assert(E => 257, R => 114); assert(A => 271, N => 128); assert(S => 272, L => 129); assert(G => 273, M => 130); assert(H => 274, S => 131); assert(S => 275, E => 132); assert(R => 276, E => 133); assert(T => 277, V => 134); assert(T => 278, T => 135); assert(V => 279, G => 136); assert(K => 280, Q => 137); assert(T => 281, '-' => 'undef'); assert(I => 282, F => 138); done_testing();

I've changed the sequnces a bit to detect off-by-one errors more easily.

($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

Replies are listed 'Best First'.
Re^3: Per residue sequence alignment - per character string comparison?
by proteins (Novice) on Aug 10, 2016 at 13:27 UTC

    Thank you for the comprehensive reply. Some of this code is beyond my current Perl comprehension, but I'll work through it and let you know if I have any questions.

    I am currently writing the code to read in the alignments and residues to align from a large text file. I'll run Benchmark on a subset of the data once that is working.