raflach has asked for the wisdom of the Perl Monks concerning the following question:

What I'm tring to do is grab html code and remove certain items from it. The HTML is actually a template that includes variables that will later be replaced with data by a perl script. the replacement variables look like (~ something ~)
Sometimes they may be inside of a form element, as a value... Sometimes not.
So how do I get a regex that will replace this:
<input type="button" value="(~ something ~)">
with
(~ something ~)
without also causing this:
<p>Date: (~ date ~)</p>
to be replaced with
(~ date ~)
Anyone?

Replies are listed 'Best First'.
Re: Regex grabs too much
by swiftone (Curate) on Jun 05, 2000 at 20:23 UTC
    First, I would recommend you look into Text::Template. Everything you are trying to do here has been done better than I could do it :)

    Second, try:

    $line=s/~\s*([^~]*?) ~/$$1/ge;
    The /e should cause the right-hand side of the s// to be evaluated as a symbolic reference. the ? after the * makes it non-greedy, leaving a space.
Re: Regex grabs too much
by raflach (Pilgrim) on Jun 05, 2000 at 20:38 UTC
    Sorry, I should have included Code.
    $data =~ s/<.*?(\(~ .*? ~\)).*?>/$1/g
    was my first attempt, and grabbed way way way too much.
    $data =~ s/<.{1,10}(\(~ .*? ~\)).{1,10}>/$1/g
    Came closer but still wouldn't have worked in the example I gave.

    What I want is for
    <input type="button" value="(~ something ~)">
    to be converted to just the variable, removing the form element around it, and for
    <p> date: (~ something ~) </p>
    to be left exactly as it is.
      It sould like you could express what you want as follows:
      If one of my special (~ ... ~) tags is found inside an HTML tag, replace that whole HTML tag with my special tag. Otherwise leave it alone.
      In that case the following code should do the trick:
      $data =~ s/<[^<>]*?(\(~ .*? ~\))[^<>]*?>/$1/g;

      comment added in response to the following message

      [^<>]
      That is a character-class consisting of not > or <. The perlre documentation has more details on how this works. My RE works by looking for the opening < of an HTML tag, then 0 or more non < > characters, then the special tag, then 0 or more non < > characters, then the > that closes the original HTML tag.
        That worked perfectly! Can you explain what
        [^<>]
        actually is doing? That seems to be the only difference between your code and mine, and it confuses me considerably.