Sorry, sorry, sorry Monks! Here are some verbose explanations and examples, hope these help
Before that, just a lil bit of sequence alignment lingo from biologists dictionary:
substitution: when a letter in one sequence is replaced by a different letter. in another sequence. eg.GGTA is substituted in 2 places to give CCTA , TACGACT substituted in 1 place to give AACGACT etc.
indel: when one letter in one sequence is replaced by nothing in the second sequence. 1st sequence is said to have an insertion (IN) and 2nd sequence has a deletion (DEL). Hence the term INDEL eg. TAGAGGATC and TAGAGATC differ by 1 indel position, so 2nd sequence when aligned would be TAGAG-ATC or TAGA-GATC
case c: the mismatch is not due to substitution of one letter for another, but a gap (shown as '-' here) due to a missing letter when comparing the 2 sequences
AC-TACGTAC
ACGTACGTAC
or
ACGTACGTAC
ACGTACGT-C
case d: the mismatch is due to substitution of one letter for another, and not an insertion or deletion as show in examples above, for case c.
CTTACGTAC
CGTACGTAC
or
CGTACGTGC
CGTACGTCC
case e: same as case d. above, except the matched lengths are 10 letters long, and not 9 letters as for case d.. Mis-match is not due to insertion or deletion, but a substitution, again as for case d.
ACCTACGTAC
ACGTACGTAC
or
GTACGTACGG
GTACGTTCGG
Some examples of what should pass the filters and what should not are shown below
10nt sequences, no indels, no substitutions, perfect matches, passes filter, all OK
ATGGACGTAC
ATGGACGTAC
9nt sequences, no indels, no substitutions, perfect matches, passes filter, all OK
CGTACAGTA
CGTACAGTA
10nt sequences, 1 indel position, passes filter, OK
AC-TACGTAC
ACGTACGTAC
10nt sequences, 2 indel positions in total,1 indel on one sequence and 2nd indel on 2nd sequence, does not passes filter, not OK
AC-TACGTAC
ACGTACG-AC
10nt sequences, 2 indel positions in total, both indels on same sequence, does not passes filter, not OK
AC-TAC-TAC
ACGTACGTAC
Bottom line is that when sequences of 10 letters are aligned to each other, there should be at the very minimum 9 letters that are aligned with a maximum of 1 indel or substitution. At the very best, all 10 letters are perfectly matched with no indels and no substitutions. And all other intermediate cases, some with examples of alignments above. I hope things are a little better to understand now, especially for non-biologists. Sorry for the cryptic explanation in my OP! |