in reply to UNO DOS, HTML

Your problem is the .*? -- although you rightly tried to limit the number of don't care characters using '?' your match still grabs the smallest number of don't care characters between '<!--' and 'DOS'... which just happens to include another '-->' and '<!--', the pieces it isn't supposed to grab.

You'll need a few baselines to come up with a workable regexp:

These would help you optimize your regexp...

But the key point is that you need to limit your regexp to a single comment group containing 'DOS'.

I'd suggest using:

$html = '<!--% xxxx UNO xxxx %--> <!--% xxxx DOS xxxx %-->'; $html =~ s/<!--%([^->]+?) DOS ([^->]+)%-->/GONE/s; print $html;

It's a little ugly, and notice that it assumes that your 'xxxx' can't contain '->', which may or may not be the case.

YMMV

Replies are listed 'Best First'.
RE: Re: UNO DOS
by Adam (Vicar) on Sep 13, 2000 at 04:26 UTC
    Good call on the .* There is a node around here called Death to Dot Star! which explores this further. But your regex still needs work. The bracket elements are not a group, they are individual. meaning that it would catch any html item, not just -- >, because it matches the > alone. Ok?
    How about:
    $html =~ s/<!--%(?!%-->)DOS(?!%-->)%-->/GONE/s;
    I'm not sure about that regex, I've never used a zero-width negative look-ahead assertion, but I think that's the right direction.

      I've been sweating buckets about this one ever since I left the office... <visions of minus 30XP dancing in my head > which is, of course, just the time to realize that you screwed up the regexp. :^P