Re: Filtering matches of near-perfect-matched DNA sequence pairs

Replies are listed 'Best First'.
Re^2: Filtering matches of near-perfect-matched DNA sequence pairs by onlyIDleft (Scribe) on Mar 13, 2015 at 23:30 UTC
Sorry, sorry, sorry Monks! Here are some verbose explanations and examples, hope these help Before that, just a lil bit of sequence alignment lingo from biologists dictionary: substitution: when a letter in one sequence is replaced by a different letter. in another sequence. eg.GGTA is substituted in 2 places to give CCTA , TACGACT substituted in 1 place to give AACGACT etc. indel: when one letter in one sequence is replaced by nothing in the second sequence. 1st sequence is said to have an insertion (IN) and 2nd sequence has a deletion (DEL). Hence the term INDEL eg. TAGAGGATC and TAGAGATC differ by 1 indel position, so 2nd sequence when aligned would be TAGAG-ATC or TAGA-GATC case c: the mismatch is not due to substitution of one letter for another, but a gap (shown as '-' here) due to a missing letter when comparing the 2 sequences `AC-TACGTAC ACGTACGTAC` [download] or `ACGTACGTAC ACGTACGT-C` [download] case d: the mismatch is due to substitution of one letter for another, and not an insertion or deletion as show in examples above, for case c. `CTTACGTAC CGTACGTAC` [download] or `CGTACGTGC CGTACGTCC` [download] case e: same as case d. above, except the matched lengths are 10 letters long, and not 9 letters as for case d.. Mis-match is not due to insertion or deletion, but a substitution, again as for case d. `ACCTACGTAC ACGTACGTAC` [download] or `GTACGTACGG GTACGTTCGG` [download] Some examples of what should pass the filters and what should not are shown below 10nt sequences, no indels, no substitutions, perfect matches, passes filter, all OK `ATGGACGTAC ATGGACGTAC` [download] 9nt sequences, no indels, no substitutions, perfect matches, passes filter, all OK `CGTACAGTA CGTACAGTA` [download] 10nt sequences, 1 indel position, passes filter, OK `AC-TACGTAC ACGTACGTAC` [download] 10nt sequences, 2 indel positions in total,1 indel on one sequence and 2nd indel on 2nd sequence, does not passes filter, not OK `AC-TACGTAC ACGTACG-AC` [download] 10nt sequences, 2 indel positions in total, both indels on same sequence, does not passes filter, not OK `AC-TAC-TAC ACGTACGTAC` [download] Bottom line is that when sequences of 10 letters are aligned to each other, there should be at the very minimum 9 letters that are aligned with a maximum of 1 indel or substitution. At the very best, all 10 letters are perfectly matched with no indels and no substitutions. And all other intermediate cases, some with examples of alignments above. I hope things are a little better to understand now, especially for non-biologists. Sorry for the cryptic explanation in my OP!	[reply] [d/l] [select]
Re^3: Filtering matches of near-perfect-matched DNA sequence pairs by choroba (Cardinal) on Mar 13, 2015 at 23:39 UTC
Thank you. I still don't understand, though. What is the input? Are the dashes already there? لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply]
Re^4: Filtering matches of near-perfect-matched DNA sequence pairs by Anonymous Monk on Mar 14, 2015 at 00:31 UTC
And if the dashes are not there, are both sequences still 10 long? Unlike what you showed...	[reply]
Re^4: Filtering matches of near-perfect-matched DNA sequence pairs by onlyIDleft (Scribe) on Mar 15, 2015 at 02:00 UTC
Nope, the dashes are to help the reader understand where there is a is an insertion/deletion (indel) event, and not a substitution of a letter. Such a gap caused by indel(s). i.e. absence of an aligned letter is commonly signified by the '-' symbol in sequence alignments by biologists. You may replace it in your mind with just a blank space if that helps you. Hope that clarifies it.	[reply]
Re^5: Filtering matches of near-perfect-matched DNA sequence pairs by Anonymous Monk on Mar 15, 2015 at 03:02 UTC
Re^6: Filtering matches of near-perfect-matched DNA sequence pairs by BrowserUk (Patriarch) on Mar 15, 2015 at 07:19 UTC