in reply to mismatching characters in dna sequence

Runs fine for me with 5 million characters on each string - only takes like 3 seconds and this is a really old comp.

use strict; use warnings; my ($s1, $s2, $l, $i, $c1, $c2, @m); $s1 = 'ATACCGGC'; $s1 .= 'ATTTT'x1000000; $s2 = 'ATTCCGGG'; $s2 .= 'ATTTT'x1000000; for $i (0..(length($s1)-1)) { $c1 = substr($s1, $i, 1); $c2 = substr($s2, $i, 1); push @m, [$i, $c1, $c2] if $c1 ne $c2; } print (($#m+1) . ' mismatches with target at position(s) ' . join(', ' +, map { $_->[0] } @m) . ' (' . join(', ', map { $_->[2].'->'.$_->[1] +} @m) . ')');

Replies are listed 'Best First'.
Re^2: mismatching characters in dna sequence
by Eliya (Vicar) on Dec 30, 2011 at 04:35 UTC

    When comparing many strings against many strings (not just two), performance actually might matter.

    On my machine, your substr method takes 2.65 secs, while the XOR method I suggested above takes only 0.12 secs (for the same data), i.e. it's roughly 20 times as fast.

      eliya -- is your transliteration correct? the results differ from the other code snippets already posted.

        AFAICT, they only differ in being zero- vs. one-based, i.e. in my output "4" means 4th character. Both can be trivially converted to one another by adding or subtracting 1.  Or which difference are you referring to?