water has asked for the wisdom of the Perl Monks concerning the following question:

Hi.

I'm having trouble writing a regexp to tag text with markup that preserves case. For example, say I want to replace

/\b[Ca]at\b/
with
<span class="topic">Cat</a>
or
<span class="topic">cat</a>
depending on the original case of 'cat'.

The /i flag helps with the match, but unsure how to get the right case on the first letter.

I'm matching many patterns against many peices of text.

thanks!

water

Replies are listed 'Best First'.
Re: case preservation in regexp
by cog (Parson) on Feb 10, 2005 at 11:56 UTC
    I think that was /\b[Cc]at\b/

    Do it like this:

    s/\b([Cc])at\b/<span class="topic">${1}at</a>

    By catching the "c" (be it "C" or "c") in $1 and using it in the substitution, you're preserving it's case.

    Adjust to fit your particular need :-)

Re: case preservation in regexp
by BrowserUk (Patriarch) on Feb 10, 2005 at 11:58 UTC

    This will probably do what you are asking for:

    $text =~ s[\b([Cc]at)\b][<span class="topic">$1</a>]g;

    but you also probably want </span> rather than </a> as requested :)


    Examine what is said, not who speaks.
    Silence betokens consent.
    Love the truth but pardon error.
Re: case preservation in regexp
by CountZero (Bishop) on Feb 10, 2005 at 12:38 UTC
    Or print "<span class='topic'>$1</span>" if /\b(cat)\b/i;

    Update: Added the missing parentheses (thanks halley)

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

      Parentheses matter. You didn't capture a $1.
      print "<span class='topic'>$1</span>" if /\b(cat)\b/i;
      If only the first letter should be flexible,
      print "<span class='topic'>$1</span>" if /\b([Cc]at)\b/;

      --
      [ e d @ h a l l e y . c c ]

Re: case preservation in regexp
by revdiablo (Prior) on Feb 10, 2005 at 17:45 UTC

    I don't want to jump to conclusions, but just in case, I'd like to mention that parsing HTML with regular expressions is usually a bad idea. There are many common pitfalls, and in the presence of some good CPAN modules, there's no reason to do it.

    So even though your examples don't actually show HTML being parsed, it might be worthwhile to look at HTML::Parser, HTML::TokeParser, HTML::TokeParser::Simple, and perhaps HTML::TreeBuilder. Those are good general purpose modules. There are also some good specific purpose modules, such as HTML::TableExtract, that could be useful. Take some time searching CPAN, or asking around, and you may find something that makes your task a lot easier.

Re: case preservation in regexp
by perlsen (Chaplain) on Feb 10, 2005 at 12:45 UTC

    Hi, just include
    c instead of a

    in your expression

    $input='Cat cat'; $input =~ s#\b([Cc]at)\b#<span class="topic">$1</a>#g; print $input; #output #<span class="topic">Cat</a> <span class="topic">cat</a>