MonkPaul has asked for the wisdom of the Perl Monks concerning the following question:

Monkies,

My current project i have to take in two DNA sequences for an enzyme recognition site in a sequence 1 and the sequence itself 2.
What im now looking to do is to search for the presence of the enzyme seq.1 in the long seq.2 and whenever i find seq1 in seq2, highlight it in bold.

Basically it searching for a substring in a string and highlighting it whereever it occurs.

How can i possibly highlight the substring?
The current code im using searches for the substring using the index() function but im not sure how to make it bold. If it helps, the result has to be displayed in HTML.

Any help would be great.
Cheers

Fantastic. Thanks guys. i had a rough idea how to go about it but wasnt sure of its possible implementation. :) :)

Replies are listed 'Best First'.
Re: Highlight a substring
by ikegami (Patriarch) on Apr 06, 2005 at 14:48 UTC

    To add highlight to HTML given start $index and $length:

    substr($str, $start_idx, $length) = '<b>' . substr($str, $start_idx, $length) . '</b>';

    Once you add searching, it looks like:

    my $index = 0; my $replace = "<b>$seq</b>"; while (($index = rindex($str, $seq, $index)) >= 0) { substr($str, $index, length($seq)) = $replace; }

    Update: Optimized

Re: Highlight a substring
by JediWizard (Deacon) on Apr 06, 2005 at 14:52 UTC

    see: index and substr

    my $string1 = 'realy long string of html'; my $string2 = 'long'; my $index = index($string1, $string2); while($index > 0){ substr($string1, $index, length($string2), '<b>'.$string2.'</b>'); $index = index($string1, $string2, ($index + length($string2))); }

    A truely compassionate attitude towards other does not change, even if they behave negatively or hurt you

    —His Holiness, The Dalai Lama

      That should be
      while ($index >= 0) {
      and not
      while ($index > 0) {

      I have more of a question than a comment. The first thing I would have thought to do was this...
      my $string1 = 'realy long string of html and here is the word long aga +in'; my $string2 = 'long'; + $string1 =~ s/($string2)/<b>$1<\/b>/g; + print "string1: $string1\n";

      I know tmtowtdi, but is there another reason that the 2 responses to MonkPaul's question didn't involve s///g?

      Thanks again oh wise monks.
        index is much faster than m// when matching a constant string. I'm guessing index + substr is similarly faster than s///. Feel free to benchmark the two.

        By the way,
        $string1 =~ s/($string2)/<b>$1<\/b>/g;
        should be replaced with
        $string1 =~ s/(\Q$string2\E)/<b>$1<\/b>/g;
        in case $string2 contains special characters. Better yet,
        $string1 =~ s/((?:\Q$string2\E)+)/<b>$1<\/b>/g;
        should be a little bit faster.

        I'd also Benchmark the following which has fewer concatenations:

        $string1 =~ s#(?=(?:\Q$string2\E)+)#<b>#g; $string1 =~ s#(?<=(?:\Q$string2\E)+)#</b>#g;