in reply to Re: Bracketing Substring(s) in the String
in thread Bracketing Substring(s) in the String

Dear sgifford,
First of all I want to apologize for having have to come back to you to ask this question after some time.

Since your solution above is so important to me, I need to turn to you for this.
I truly don't know how to go about it. I hope you won't mind.

Your code above provide 99% correct solutions, except the following case.
Example 1:
#Given: my $s5 ='CTGGGTATGGGT'; my @a5 = qw(GTATG TGGGT);
Your code above returns
C[TGGGTATGGGT]
Instead of this the correct one:
CTGG[GTATGGGT]
The explanation is as follows TGGGT occur twice in $s5.
$s5 = "CTGGGTATGGGT"; TGGGT GTATG -- |--- Only this two satisfy. TGGGT -- Since it follows order and delim of the given array.
Now why the latter is the correct answer. It is because in the array @a5 = qw(GTATG TGGGT), the string "TGGGT" comes after "GTATG", thus the bracketed region should also follow the order of the given array and the span delimited also by the array. By that I mean, the bracketed regions -- be it disjointed or overlapped -- should always start with first element of the array and end with the last element of the array.

Let me give another examples, hope it clarifies.
Example 2:
# Given: my $s6 = 'AGGAACTTGCCTGTACCACAGGAAG'; my @a6 = qw( CAGGA AGGAA );
The current solution gives:
[AGGAA]CTTGCCTGTACCA[CAGGAA]G
The correct answer is:
AGGAACTTGCCTGTACCA[CAGGAA]G
To simplify the matter. If there exist more than one region possible bounded by the array taking any of single of them would do. Here is another examples:
Example 3:
# Given: my $s7 = 'CAGGACTTGCCTGTACCACAGGAAG'; my @a7 = qw( CAGGA ); This answer would do: CAGGACTTGCCTGTACCA[CAGGA]AG
Example 4:
# Given: my $s8 = 'CAGGATTTGAGGAAGTACCACAGGAAG'; my @a8 = qw( CAGGA AGGAA ); This answer would do, taking those closest together: CAGGATTTGAGGAAGTACCA[CAGGAA]G doesn't have to be this: [CAGGA]TTTG[AGGAA]GTACCACAGGAAG
I would also need to state that the size length of the string in the array is always fixed. In our examples they are always of length 5.

Is there a way I can modify your code above so that it can handle such case? Hope to hear from you again. I'll try not to bother you again after this.

Update: I can supply the substring that comes with index to disambiguate, if that can be helpful?
my $t1 ='CCCATCTGTCCTTATTTGCTG'; my @ar1 = qw(ATCTG-3 ATTTG-13); my $t2 ='ACCCATCTGTCCTTGGCCAT'; my @ar2 = qw(CCATC-2); my $t3 ='CCACCAGCACCTGTC'; my @ar3 = qw(CCACC-0 CCAGC-3 GCACC-6); my $t4 ='CCCAACACCTGCTGCCT'; my @ar4 = qw(CCAAC-1 ACACC-4); my $t5 ='CTGGGTATGGGT'; my @ar5 = qw(GTATG-4 TGGGT-7); my $t6 = 'AGGAACTTGCCTGTACCACAGGAAG'; my @ar6 = qw( CAGGA-18 AGGAA-19 ); my $t7 = 'CAGGACTTGCCTGTACCACAGGAAG'; my @ar7 = qw( CAGGA-18 ); my $t8 = 'CAGGATTTGAGGAAGTACCACAGGAAG'; my @ar8 = qw( CAGGA-18 AGGAA-19 );
Update 2 : I think I've got the solution. Thanks so much sgifford, sorry for the trouble.
sub put_bracket_wth_idx { my ( $str, $ar ) = @_; for my $subs ( @$ar ) { my ($sb,$id) = split("-",$subs); if ( substr( $str, $id ) =~ /$sb/i ) { substr( $str, $id, length $sb ) =~ tr/A-Z/a-z/; } } $str =~ s/([a-z]+)/[\U$1\E]/g; return $str; }

Regards,
Edward