in reply to Re^4: Filtering matches of near-perfect-matched DNA sequence pairs
in thread Filtering matches of near-perfect-matched DNA sequence pairs

Then what actual input caused the strings with the dashes in them? Are all strings with - in them a result instead of an input?

This just makes it more unclear exactly what the inputs to your problem are. You are showing sequences with dashes and sequences with only nine characters in them, in contradiction to your original problem statement. (two 10 character strings)

Please make up a test file with the *real* input sequences in it, and also show the expected output for each input.

  • Comment on Re^5: Filtering matches of near-perfect-matched DNA sequence pairs

Replies are listed 'Best First'.
Re^6: Filtering matches of near-perfect-matched DNA sequence pairs
by BrowserUk (Patriarch) on Mar 15, 2015 at 07:19 UTC

    I suspect, based on previous similar questions of this type, that what the op has is:

    1. A bunch of long sequences; possibly 1000s or 100,000s bytes/codons/other long.
    2. A bunch of shorter sequences perhaps 10 chars, perhaps 9 or 10 chars, long.

    And the process he's trying to code is:

    • For each of the long sequences...
    • For each of the short sequences...
    • Scan the longer sequence looking for sites where the shorter sequence 'matches' and record those positions.

    The complication is that the matching is 'fuzzy' within his set of constraints:

    • such that a 9 character subsection of the larger sequence may be considered a match to a 10-character short sequence, if (for example) the removal of any (exactly one) character from the 10 character sequence allows it to match the 9 character subsection of the longer sequence.
    • Or, a 10-character short sequence may be considered a match for a 10-character subsection of the longer sequence if they differ in exactly (only) one position.

    But, that's just supposition until he answers somebody's questions!


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked