Re: mismatching characters in dna sequence

Runs fine for me with 5 million characters on each string - only takes like 3 seconds and this is a really old comp.

use strict;
use warnings;

my ($s1, $s2, $l, $i, $c1, $c2, @m);

$s1 = 'ATACCGGC'; $s1 .= 'ATTTT'x1000000;
$s2 = 'ATTCCGGG'; $s2 .= 'ATTTT'x1000000;

for $i (0..(length($s1)-1)) {
    $c1 = substr($s1, $i, 1);
    $c2 = substr($s2, $i, 1);

    push @m, [$i, $c1, $c2]
        if $c1 ne $c2;
}

print (($#m+1) . ' mismatches with target at position(s) ' . join(', '
+, map { $_->[0] } @m) . ' (' . join(', ', map { $_->[2].'->'.$_->[1] 
+} @m) . ')');
[download]

Comment on Re: mismatching characters in dna sequence Download Code

Replies are listed 'Best First'.
Re^2: mismatching characters in dna sequence by Eliya (Vicar) on Dec 30, 2011 at 04:35 UTC
When comparing many strings against many strings (not just two), performance actually might matter. On my machine, your substr method takes 2.65 secs, while the XOR method I suggested above takes only 0.12 secs (for the same data), i.e. it's roughly 20 times as fast.	[reply]
Re^3: mismatching characters in dna sequence by prbndr (Acolyte) on Dec 30, 2011 at 04:44 UTC
eliya -- is your transliteration correct? the results differ from the other code snippets already posted.	[reply]
Re^4: mismatching characters in dna sequence by Eliya (Vicar) on Dec 30, 2011 at 05:00 UTC
AFAICT, they only differ in being zero- vs. one-based, i.e. in my output "4" means 4th character. Both can be trivially converted to one another by adding or subtracting 1. Or which difference are you referring to?	[reply]
Re^5: mismatching characters in dna sequence by prbndr (Acolyte) on Dec 30, 2011 at 05:05 UTC
Re^6: mismatching characters in dna sequence by Eliya (Vicar) on Dec 30, 2011 at 05:44 UTC
Some notes below your chosen depth have not been shown here