in reply to Bracketing Substring(s) in the String

The code itself is okay, except that that arrays 3 and 4 do not contain a non-overlapping set of substrings of the string they are operating on and will therefore not meet the substitution criteria later in the program. To get the desired result:
my @a3 = qw( CCACCAGCACC ); my @a4 = qw( CCAACACC );

One world, one people

Replies are listed 'Best First'.
Re^2: Bracketing Substring(s) in the String
by InfiniteSilence (Curate) on Aug 25, 2005 at 16:30 UTC
    Take a closer look at the actual original strings. The second two strings match multiple substrings, so the desired output actually shows this, but in a sort-of-confusing way. Take this example:
    STRING ------- GCGCTCGACGC SUBSTRINGS ---------- GCGC ACG == [GCGC]TC[ACG]C
    But, when we have OVERLAPPING sequences the output should 'mash-up' a bit:
    STRING ------- GCGCTCGACGC SUBSTRINGS ---------- GCGC GCTC == [GCGCTC]GACGC
    Do you see how GCGC AND GCTC MERGE into one single substring for the desired output?

    So I think the algorithm should look like this:

  • Make as many straight matches as you can
  • If your match is within a string that has already been matched, modify that match to include the new match

    Can you imagine how messy this would look if you had 100 substrings and a main string running 10,000 letters long (which I assume is possible because this stuff looks like gene sequence data)?

    Celebrate Intellectual Diversity