in reply to Re^2: mismatching characters in dna sequence
in thread mismatching characters in dna sequence

eliya -- is your transliteration correct? the results differ from the other code snippets already posted.
  • Comment on Re^3: mismatching characters in dna sequence

Replies are listed 'Best First'.
Re^4: mismatching characters in dna sequence
by Eliya (Vicar) on Dec 30, 2011 at 05:00 UTC

    AFAICT, they only differ in being zero- vs. one-based, i.e. in my output "4" means 4th character. Both can be trivially converted to one another by adding or subtracting 1.  Or which difference are you referring to?

      in some very rare cases i have a conversion from an A->N. the N is just another character. the other code snippets catch this type of conversion whereas your code calls it a G->A. why is this the case?

        That's because you didn't mention the "N" in your original post :)

        The idea of the transliteration is that the XOR value computed for every (directed) comparison of characters is different. This can only be determined for a predefined set of allowed characters.

        To also allow "N", you could (for example) use the transliteration tr/ATCGN/J4XD7/. With this, the XOR values for the respective changes would compute as:

        XOR val change \x0b => A->A * ( "A" ^ "J" ) \x19 => A->C ( "A" ^ "X" ) \x05 => A->G ( "A" ^ "D" ) \x76 => A->N ... \x75 => A->T \x09 => C->A \x1b => C->C * \x07 => C->G \x74 => C->N \x77 => C->T \x0d => G->A \x1f => G->C \x03 => G->G * \x70 => G->N \x73 => G->T \x04 => N->A \x16 => N->C \x0a => N->G \x79 => N->N * \x7a => N->T \x1e => T->A \x0c => T->C \x10 => T->G \x63 => T->N \x60 => T->T *

        The ones marked with "*" are the "no-changes", which should make up the exclusion character set in the final match. I.e., with the above modified transliteration, you should change that to

        while ($diff =~ /([^\x0b\x1b\x03\x79\x60])/g) {