Regex grabs too much

raflach has asked for the wisdom of the Perl Monks concerning the following question:

What I'm tring to do is grab html code and remove certain items from it. The HTML is actually a template that includes variables that will later be replaced with data by a perl script. the replacement variables look like (~ something ~)
Sometimes they may be inside of a form element, as a value... Sometimes not.
So how do I get a regex that will replace this:

<input type="button" value="(~ something ~)">
[download]

with

(~ something ~)
[download]

without also causing this:

<p>Date: (~ date ~)</p>
[download]

to be replaced with

(~ date ~)
[download]

Anyone?

Comment on Regex grabs too much Select or Download Code

Replies are listed 'Best First'.
Re: Regex grabs too much by swiftone (Curate) on Jun 05, 2000 at 20:23 UTC
First, I would recommend you look into Text::Template. Everything you are trying to do here has been done better than I could do it :) Second, try: `$line=s/~\s([^~]?) ~/$$1/ge;` [download] The /e should cause the right-hand side of the s// to be evaluated as a symbolic reference. the ? after the * makes it non-greedy, leaving a space.	[reply] [d/l]
Re: Regex grabs too much by raflach (Pilgrim) on Jun 05, 2000 at 20:38 UTC
Sorry, I should have included Code. `$data =~ s/<.?($~ .? ~$).?>/$1/g` [download] was my first attempt, and grabbed way way way too much. `$data =~ s/<.{1,10}($~ .? ~$).{1,10}>/$1/g` [download] Came closer but still wouldn't have worked in the example I gave. What I want is for `<input type="button" value="(~ something ~)">` [download] to be converted to just the variable, removing the form element around it, and for `<p> date: (~ something ~) </p>` [download] to be left exactly as it is.	[reply] [d/l] [select]
RE: Re: Regex grabs too much by lhoward (Vicar) on Jun 05, 2000 at 20:56 UTC
It sould like you could express what you want as follows: If one of my special (~ ... ~) tags is found inside an HTML tag, replace that whole HTML tag with my special tag. Otherwise leave it alone. In that case the following code should do the trick: `$data =~ s/<[^<>]?($~ .? ~$)[^<>]?>/$1/g;` [download] comment added in response to the following message `[^<>]` [download] That is a character-class consisting of not > or <*. The perlre documentation has more details on how this works. My RE works by looking for the opening < of an HTML tag, then 0 or more non < > characters, then the special tag, then 0 or more non < > characters, then the > that closes the original HTML tag.	[reply] [d/l] [select]
RE: RE: Re: Regex grabs too much by raflach (Pilgrim) on Jun 05, 2000 at 21:02 UTC
That worked perfectly! Can you explain what `[^<>]` [download] actually is doing? That seems to be the only difference between your code and mine, and it confuses me considerably.	[reply] [d/l]
RE: RE: RE: Re: Regex grabs too much by Anonymous Monk on Jun 05, 2000 at 21:45 UTC
RE: RE: RE: Re: Regex grabs too much by mikfire (Deacon) on Jun 05, 2000 at 21:41 UTC