Clever mapping of the characters in one of the strings could work around the problems with the XOR approach. For example, using tr/ATCG/HRDZ/:
my %change; # reverse lookup table my @t = qw(A T C G); for my $t1 (@t) { for my $t2 (@t) { my $t = $t2; $t =~ tr/ATCG/HRDZ/; $change{ $t1 ^ $t } = "$t1->$t2"; } } sub diff { my ($target, $str) = @_; $str =~ tr/ATCG/HRDZ/; my $diff = $target ^ $str; while ($diff =~ /([^\x09\x06\x07\x1d])/g) { printf " %d: %s\n", pos($diff), $change{$1}; } } my $target = "ATTCCGGG"; for (qw(ATTGCGGG ATACCGGC)) { print "\n$target\n$_\n"; diff($target, $_); } __END__ ATTCCGGG ATTGCGGG 4: C->G ATTCCGGG ATACCGGC 3: T->A 8: G->C
The reverse lookup table just needs to be set up once, and the remaining operations (string bit operation, tr///, m//g) should all be pretty fast.
(As the keys in the lookup table are integers < 256, you could in theory also set up an array, and use the xor value as the index.)
In reply to Re: mismatching characters in dna sequence
by Eliya
in thread mismatching characters in dna sequence
by prbndr
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |