Are my assumptions correct ?? Or the sequences are already aligned and you just wish to count the differences ?
In both cases, when alignment is done to count the diff you just iterate through both arrays count triplets, hash them and the for every triplet make a subhash that will record the type and the count of a specific change.
Example:
Result:use strict; use Data::Dumper; my $r = 'AAATGTGATGTGAACGT'; my $t = 'AATGTGTCGT-TG-ATG'; my @a = split('',$r); my @v = split('',$t); my %hash =(); my $tt = @a>@v ? @a : @v; for(my $i = 0 ; $i<$tt;$i++){ my $z = 1+$i %3; # Update - suggested by Perlbotics, better ! unless ($a[$i] eq $v[$i]){ $hash{$z}->{"$a[$i]2$v[$i]"}++; } } print Dumper(\%hash);
$VAR1 = { '1' => { 'G2T' => 3, 'T2G' => 1, 'A2G' => 1 }, '3' => { 'G2T' => 1, 'C2A' => 1, 'T2G' => 2, 'A2T' => 1 }, '2' => { 'G2T' => 1, 'T2G' => 1, 'A2-' => 1, 'A2C' => 1, 'T2-' => 1 } };
|
|---|