Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Wise monks, I have a question. How can I substitute every occurence of  cat with  <a href=x>cat</a> and every occurence of  Cat with  <a href=x>Cat</a> As a first attempt, I defined
$term=cat; $newterm="<a href=x>$term</a>"; $text=~s/(\W)$term(\W)/\1$newterm\2/i;
but this turns  Cat into  <a href=x>cat</a> that is, I lose the capitilization of the term. Thank you for sharing of your wisdom.

Replies are listed 'Best First'.
Re: substitution preserving capitilization
by joealba (Hermit) on Dec 17, 2002 at 18:47 UTC
    $text =~ s|\b($term)\b|<a href=x>$1</a>|ig;

    Capturing your search term in case-insensitive mode still preserves the original case when you use the captured text in your replacement.

    This matches on word boundaries, rather than the \W non-word characters you were using.
Re: substitution preserving capitilization
by jdporter (Paladin) on Dec 17, 2002 at 18:47 UTC
    The key is to capture the term that you actually matched, and use that in the substitution.
    Something like this:
    $term = 'cat'; $text =~ s/\b($term)\b/<a href=x>$1</a>/i;
    (Note: you don't need to capture the stuff before and after the term, just to put them back in the replacement. That's what happens by default.)

    By taking this approach, you can make $term be a conjunction of all the possibilities, like so:
    $term = join '|', 'cat', 'Cat', 'dog', 'Dog';
    Then, whichever one matches, that will be seen in $1 in the replacement string.

    jdporter
    ...porque es dificil estar guapo y blanco.

Re: substitution preserving capitilization
by Enlil (Parson) on Dec 17, 2002 at 18:49 UTC
    Here is one solution:
    my @term= ("My Cat", "your CAt", "thiS CAT", "THAt caT", "their cAT", +"dead cAt", "cAtostrophic cat"); my $search='cat'; for ( @term ) { s!\b($search)\b!<a href=x>$1</a>!i; print $_,$/; }

    This finds all "cat"s (the i at the end of the regular expression makes it case insensitive.), captures them and puts them into $1, and then prints them so you can see what has changed. The \b on both sides insures that there is a word boundary on both sides of cat so things like catostrophic are not changed. Note that this will only capture one cat per line you need to add a g at the end of the regular expression if you want to catch more.

    -enlil