in reply to Re^2: mismatching characters in dna sequence
in thread mismatching characters in dna sequence
Which method are you talking about, with respect to those 5 seconds? The XOR method outlined above takes just ~0.05 secs on my 4-year-oldish machine, for 10,000 comparisons against a common 40-char target (with ~3-5 deviations per sequence):
my @set = qw(A T C G); my $target = join "", map $set[rand @set], 1..40; my @tests ; for (1..10000) { my $test = $target; for (1..5) { substr($test, rand(length($target)), 1) = $set[rand @set]; } push @tests, $test; } use Time::HiRes qw(time); my $start = time(); my %change; # reverse lookup table for my $t1 (@set) { my $t = $t1; $t =~ tr/ATCG/HRDZ/; for my $t2 (@set) { $change{ $t ^ $t2 } = "$t1->$t2"; } } $target =~ tr/ATCG/HRDZ/; for my $test (@tests) { my $diff = $target ^ $test; while ($diff =~ /([^\x09\x06\x07\x1d])/g) { my $pos = pos($diff); my $change = $change{$1}; # do something with them... } } printf "%.3f secs\n", time() - $start; __END__ 0.052 secs
Storing away the results somewhere or doing something else with them will presumably take considerably longer than computing them...
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: mismatching characters in dna sequence
by prbndr (Acolyte) on Dec 30, 2011 at 17:13 UTC |