in reply to Regex grabs too much

Sorry, I should have included Code.
$data =~ s/<.*?(\(~ .*? ~\)).*?>/$1/g
was my first attempt, and grabbed way way way too much.
$data =~ s/<.{1,10}(\(~ .*? ~\)).{1,10}>/$1/g
Came closer but still wouldn't have worked in the example I gave.

What I want is for
<input type="button" value="(~ something ~)">
to be converted to just the variable, removing the form element around it, and for
<p> date: (~ something ~) </p>
to be left exactly as it is.

Replies are listed 'Best First'.
RE: Re: Regex grabs too much
by lhoward (Vicar) on Jun 05, 2000 at 20:56 UTC
    It sould like you could express what you want as follows:
    If one of my special (~ ... ~) tags is found inside an HTML tag, replace that whole HTML tag with my special tag. Otherwise leave it alone.
    In that case the following code should do the trick:
    $data =~ s/<[^<>]*?(\(~ .*? ~\))[^<>]*?>/$1/g;

    comment added in response to the following message

    [^<>]
    That is a character-class consisting of not > or <. The perlre documentation has more details on how this works. My RE works by looking for the opening < of an HTML tag, then 0 or more non < > characters, then the special tag, then 0 or more non < > characters, then the > that closes the original HTML tag.
      That worked perfectly! Can you explain what
      [^<>]
      actually is doing? That seems to be the only difference between your code and mine, and it confuses me considerably.
        It means "not one of those two characters"

        viva el perl libre

        I am not an lhoward and I do not play one on TV, but I can explain that bit of regex.

        lhoward has defined a set, as indicated by the [ ]. When perl's regex engine sees this, anything within the brackets will be considered a match. However, lhoward was tricky and made the first character a ^. When the first character of set is a ^, it negates the set ( mathematicians call it the complement, but I can't spell 'complement' ) and tells the regex to match everything but what is in the set.

        [^<>]
        is saying to match anything that isn't a < or a >

        HTH,
        mikfire