Re: UNO DOS
by lhoward (Vicar) on Sep 13, 2000 at 04:13 UTC
|
You might be better off using HTML::Parser to parse
out HTML tags, then apply your regular expression
on a tag-by-tag basis. It will be very difficult to get your
regular expression to work properly considering the variety
and complexity that can occur in an HTML document. | [reply] |
|
|
I would agree, unless this is the ONLY thing he wants to do.
| [reply] |
Re: UNO DOS
by jreades (Friar) on Sep 13, 2000 at 03:29 UTC
|
Your problem is the .*? -- although you rightly tried to limit the number of don't care characters using '?' your match still grabs the smallest number of don't care characters between '<!--' and 'DOS'... which just happens to include another '-->' and '<!--', the pieces it isn't supposed to grab.
You'll need a few baselines to come up with a workable regexp:
- Can xxxx ever include '<!--' or '-->'? (We'd better hope not)
- Can xxxx contain only word-like characters (\w)?
- Or can it include space characters as well (\s)?
These would help you optimize your regexp...
But the key point is that you need to limit your regexp to a single comment group containing 'DOS'.
I'd suggest using:
$html = '<!--% xxxx UNO xxxx %--> <!--% xxxx DOS xxxx %-->';
$html =~ s/<!--%([^->]+?) DOS ([^->]+)%-->/GONE/s;
print $html;
It's a little ugly, and notice that it assumes that your 'xxxx' can't contain '->', which may or may not be the case.
YMMV | [reply] [d/l] |
|
|
Good call on the .* There is a node around here called Death to Dot Star! which explores this further. But your regex still needs work. The bracket elements are not a group, they are individual. meaning that it would catch any html item, not just -- >, because it matches the > alone. Ok?
How about:
$html =~ s/<!--%(?!%-->)DOS(?!%-->)%-->/GONE/s;
I'm not sure about that regex, I've never used a zero-width negative look-ahead assertion, but I think that's the right direction. | [reply] [d/l] |
|
|
| [reply] |
Re: UNO DOS
by Anonymous Monk on Sep 13, 2000 at 03:41 UTC
|
Yeah that's part of the problem.. the xxxx can contain HTML
code, which may contain regular comments, <!-- --->
however, xxxx will never contain <!--% and %-->, those are
only used as braces..
and to mirod, another thing is that there could be any number
of these tags before/after the tag we're intending to grab..
| [reply] [d/l] |
RE: UNO DOS, HTML
by runrig (Abbot) on Sep 13, 2000 at 04:02 UTC
|
$html =~ s/(<!--%(.*?)%-->)/($2=~m|DOS|)? 'GONE' : $1/esg;
| [reply] [d/l] |
RE: UNO DOS, HTML
by mirod (Canon) on Sep 13, 2000 at 03:14 UTC
|
$html = '<!--% xxxx UNO xxxx %--> <!--% xxxx DOS xxxx %-->';
$html =~ s/(<!--.*?-->\s*)<!--%(.*?) DOS (.*?)%-->/$1GONE/s;
print $html;
Or make sure there is a comment beforehand:
$html = '<!--% xxxx UNO xxxx %--> <!--% xxxx DOS xxxx %-->';
$html =~ s/-->\s*<!--%(.*?) DOS (.*?)%-->/--> GONE/s;
print $html;
There is probably a cleaner way to do this without
capturing the first comment at all, or by using a
g modifier, skipping the first comment and
replacing the second.
| [reply] [d/l] [select] |
Re: UNO DOS
by Anonymous Monk on Sep 13, 2000 at 04:35 UTC
|
well, sounds like I have to parse out each tag individually..
yucky
I guess I'll try to figure out some slob fix right now and rework the design of the templates..
| [reply] |
|
|
Instead of reworking "the design of the templates" why
not use
one of the many text/html templating
modules already in place?
| [reply] |
|
|
My answer DOES parse each tag individually, it uses a regex inside a regex, and seems to work.
| [reply] |
|
|
yeah I got it.. I was trying to avoid that, but I guess I can't.. (?)
thanks tho
| [reply] |