Re^3: mismatching characters in dna sequence

Replies are listed 'Best First'.
Re^4: mismatching characters in dna sequence by Eliya (Vicar) on Dec 30, 2011 at 05:00 UTC
AFAICT, they only differ in being zero- vs. one-based, i.e. in my output "4" means 4th character. Both can be trivially converted to one another by adding or subtracting 1. Or which difference are you referring to?	[reply]
Re^5: mismatching characters in dna sequence by prbndr (Acolyte) on Dec 30, 2011 at 05:05 UTC
in some very rare cases i have a conversion from an A->N. the N is just another character. the other code snippets catch this type of conversion whereas your code calls it a G->A. why is this the case?	[reply]
Re^6: mismatching characters in dna sequence by Eliya (Vicar) on Dec 30, 2011 at 05:44 UTC
That's because you didn't mention the "N" in your original post :) The idea of the transliteration is that the XOR value computed for every (directed) comparison of characters is different. This can only be determined for a predefined set of allowed characters. To also allow "N", you could (for example) use the transliteration `tr/ATCGN/J4XD7/`. With this, the XOR values for the respective changes would compute as: `XOR val change \x0b => A->A * ( "A" ^ "J" ) \x19 => A->C ( "A" ^ "X" ) \x05 => A->G ( "A" ^ "D" ) \x76 => A->N ... \x75 => A->T \x09 => C->A \x1b => C->C * \x07 => C->G \x74 => C->N \x77 => C->T \x0d => G->A \x1f => G->C \x03 => G->G * \x70 => G->N \x73 => G->T \x04 => N->A \x16 => N->C \x0a => N->G \x79 => N->N * \x7a => N->T \x1e => T->A \x0c => T->C \x10 => T->G \x63 => T->N \x60 => T->T ` [download] The ones marked with "" are the "no-changes", which should make up the exclusion character set in the final match. I.e., with the above modified transliteration, you should change that to `while ($diff =~ /([^\x0b\x1b\x03\x79\x60])/g) {` [download]	[reply] [d/l] [select]
Re^7: mismatching characters in dna sequence by prbndr (Acolyte) on Dec 30, 2011 at 05:54 UTC