I'm not sure how to explain this... I have DNA sequence alignments, something like sequences of letter piled up so the similar bases (letters) are in top of each other. Then, each sequence has assigned a quality scores.
Something like:
A(45) C(44) T(44) A(45)
A(31) T(31) T(35) A(37)
A(50) C(52) A(52) A(52)
I'm looking for variation in those sequences, but in order to find variation that is reliable I've calculated that I need to find groups of sequences with the same base that add up to at least quality 50.
So in the previous example the second column C(44+52=96)/T(31) does not have enough quality to be considered (only one of the bases reaches the required quality), but the third column T(44+35=79)/A(52) does.
With your script I was trying to estimate how many of those positions in the alignment can I even consider analyzing, ie. how many of those positions (array of quality scores) can be separated in two subarrays that pass the threshold.
Sorry if it's not clear.
Pepe
| [reply] |
| [reply] |
I know...
That's why I didn't want to go into detail...
Anyway, you guys have been of great help. I really appreciate it.
Thanks.
Pepe
| [reply] |