Re: Bracketing Substring(s) in the String

The code itself is okay, except that that arrays 3 and 4 do not contain a non-overlapping set of substrings of the string they are operating on and will therefore not meet the substitution criteria later in the program. To get the desired result:

my @a3 = qw( CCACCAGCACC );
my @a4 = qw( CCAACACC );
[download]

One world, one people

Comment on Re: Bracketing Substring(s) in the String Download Code

Replies are listed 'Best First'.
Re^2: Bracketing Substring(s) in the String by InfiniteSilence (Curate) on Aug 25, 2005 at 16:30 UTC
Take a closer look at the actual original strings. The second two strings match multiple substrings, so the desired output actually shows this, but in a sort-of-confusing way. Take this example: `STRING ------- GCGCTCGACGC SUBSTRINGS ---------- GCGC ACG == [GCGC]TC[ACG]C` [download] But, when we have OVERLAPPING sequences the output should 'mash-up' a bit: `STRING ------- GCGCTCGACGC SUBSTRINGS ---------- GCGC GCTC == [GCGCTC]GACGC` [download] Do you see how GCGC AND GCTC MERGE into one single substring for the desired output? So I think the algorithm should look like this: Make as many straight matches as you can If your match is within a string that has already been matched, modify that match to include the new match Can you imagine how messy this would look if you had 100 substrings and a main string running 10,000 letters long (which I assume is possible because this stuff looks like gene sequence data)? Celebrate Intellectual Diversity	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^2: Bracketing Substring(s) in the String
by InfiniteSilence (Curate) on Aug 25, 2005 at 16:30 UTC

multiple

STRING
-------
GCGCTCGACGC  

SUBSTRINGS
----------
GCGC ACG    ==  [GCGC]TC[ACG]C
[download]

STRING
-------
GCGCTCGACGC  

SUBSTRINGS
----------
GCGC GCTC  == [GCGCTC]GACGC
[download]

single

So I think the algorithm should look like this:

Make as many straight matches as you can
If your match is within a string that has already been matched, modify that match to include the new match
Can you imagine how messy this would look if you had 100 substrings and a main string running 10,000 letters long (which I assume is possible because this stuff looks like gene sequence data)?

Celebrate Intellectual Diversity

[reply]
[d/l]
[select]