I wish to extract all combinations of k-mers (k-id segments) which have a total score larger than let say 17.id 0 1 2 3 4 5 |------------------- 0| 4,-1,-2,-2, 0,-1 1| -1, 5, 0, 2,-3, 1 2| -2, 0, 6, 1,-3, 5 3| -2, 2, 1, 7,-3, 0 4| 0,-3,-3,-3, 8,-3 5| -1, 1, 5, 0,-3, 9
and all those simmilar enough with the same criterion. wher similar means ler a k-mer be 2,3,4 -> 6+7+8 = 21 and similar to that one is 5,3,4let k = 3 then 3,4,5 -> score 7+8+9 = 24 > 17 0,1,2 -> score 4+5+6 = 15 < 17
so positional similarity is only relevant. the naive approach for k = 3 would be :2->5 = 5 3->3 = 7 4->4 = 8 so 2,3,4 -> 5,3,4 = 5+7+8 = 20 > 17
also the rule is : if any id is changed, the score for a mismatch (2->5) cannot be greated than (2->2):#!/usr/bin/perl use strict; my $MTX = [ [4,-1,-2,-2, 0,-1], [-1,5, 0, 2,-3, 1], [-2,0, 6, 1,-3, 5], [-2,2, 1, 7,-3, 0], [0,-3,-3,-3, 8,-3], [-1,1, 5, 0,-3, 9], ]; for (my $i =0; $i<6; $i++){ for(my $j = 0; $j< 6; $j++){ for(my $k = 0; $k< 6; $k++){ my $sc = ($MTX->[$i]->[$i] + $MTX->[$j]->[$j] + $MTX->[$k]->[ +$k]); if ($sc < 17){ next; }else{ for (my $ii =0; $ii<6; $ii++){ for(my $jj = 0; $jj< 6; $jj++){ for(my $kk = 0; $kk< 6; $kk++){ my $score = ($MTX->[$i]->[$ii] + $MTX->[$j]->[$jj +] + $MTX->[$k]->[$kk]); if($score > 17){ print " $i,$j,$k -> $ii,$jj,$kk = $score\n"; } } } } } } } }
and the other qouestion is: what would be the best way to generalize this to any k, not just 3 as in my example above?score(2->5) < score(2->2)
thnx PS: code not necessary.
thank you
UPDATE:
Ok , after thinking a bit about it. i guess the answer to the secong question is straight forward. sinc the alphabet size is constant (6) each k-mer can be encoded onto a unique position from 1 to 6^k. Thus the first three for-loops can be replaced by a singe one. (the sam goes for the second three) and therefore the loop problem dissapears and iterations can easaly be generalized to any k size k-mer. Which leaves the first question still open.. UPDATE: To elaborate on my previous Update: this is what i ment :
Yes ofcourse this can be optimized and done in more perlish way. but this is just a conceptual problem requiring conceptual solution :) thnxmy $k = 3; for (my $i =0; $i<6**$k; $i++){ my $t = &decode($i, 6,$k); print "@{$t}\n"; } sub decode{ my ($int, $alpha, $k) = @_; my $r = 0; my $e = $int; my $rq = 0; my $rr = 0; my $p = 0; my @aa; my $j = $k-1; my $i = 0; for($i=0; $i < $k-1 ;$i++, $j--){ $p = $alpha**$j; $rr = $e%$p; $rq = $e/$p; $aa[$i] = int($rq); $e = $rr; } $aa[$i] = $e; return \@aa; } =pod ## for starting with 1 instead of 0 sub decode{ my ($int, $alpha, $k) = @_; my $r = 0; my $e = $int; my $rq = 0; my $rr = 0; my $p = 0; my @aa; my $j = $k-1; my $i = 0; for($i=0; $i < $k-1 ;$i++, $j--){ $p = $alpha**$j; $rr = $e%$p; $rq = $e/$p; if($rr == 0){ $rr = ($e-1)%$p; $rq = ($e-1)/$p; $r = 1; } $aa[$i] = int($rq); $e = $rr; } $aa[$i] = $e+$r-1; return \@aa; } =cut
In reply to Combinatorial problem by baxy77bax
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |