in reply to Re^2: finding tuples
in thread finding tuples

So let me summarize:

That's the only set of rules that I could extract from what you wrote so far. However that doesn't explain this: Sometimes it is not advisable to form a sameness-tuple because then I destroy a solution involving multiple sequences

What is the exact rule of how one tuple can "destroy" a solution, if overlap is allowed anyway?

Replies are listed 'Best First'.
Re^4: finding tuples
by ikegami (Patriarch) on Jun 23, 2009 at 20:55 UTC

    You look for a maximal set of ordered four-tuples that each consist of either four identical or four completely different letters

    The OP said "same or sequential alphabetic neighbours" and later said the alphabet is ADEFGMSTV. He also clarified that order isn't important.

    My best guess is that the following are allowed

    • AAAA
    • ADEF
    • AAAD

    and the following aren't

    • AEFG (A doesn't neighbour E, F or G)

    Please confirm.

    What is the exact rule of how one tuple can "destroy" a solution, if overlap is allowed anyway?

    A given instance of a letter can be reused. He's talking about overlaps in the input string.

    +-+-+-+----- tuple 1 matches character between indexes 0 and 6 | | | | AADDEEFF overlap over indexes 1 to 6 | | | | +-+-+-+---- tuple 2 matches character between indexes 1 and 7

      Almost right. AAAD is not allowed. It is neither 4 same nor 4 consecutive.

      You explained overlap correctly - it is an artifact from making a string out of a set.

Re^4: finding tuples
by Anonymous Monk on Jun 24, 2009 at 07:57 UTC

    I must correct rule 2: I am looking for the complete set of unordered 4-tuples, either four identical or four consecutive. Four consecutive is any true 4-substring of ADEFGMSTV.

    Sometimes it is not advisable to form a sameness-tuple

    I will explain on a minimal example. Split up the set AADDDEEEEFFFFGGM. It is tempting to first take out the 4 same E and F, but then the remainder is AADDDGGMM from which a solution cannot be formed any more because they are not alphabetic neighbours. It is false to give up now and assume that aforementioned set has no solution as there *is* a (single) solution: ADEF;ADEF;DEFG;EFGM. So one has to be smart what to pick first to not end up in dead ends.

    Rarely there are multiple solutions, but they do occur. Trivial example is GGGGMMMMSSSSTTTT. Its two solutions are GGGG;MMMM;SSSS;TTTT and GMST;GMST;GMST;GMST.