in reply to Re: HTML stripper...
in thread HTML stripper...

It's invalid because there's no space before -->
That's bogus. There's no need for space to be there. Nor does there have to be space as the first character following a COM sequence (COM being --).

OTOH, your pattern falsely considers <!-- -- --> to be a valid comment, while it doesn't consider <!-- <!-- --> --> to be valid.

This matches HTML comments:

<!(?:--(?:[^-]*(?:-[^-]+)*)--\s*)*>
although if you are truely pedantic, you'd replace the \s with the set of characters the HTML DTD defines as white space characters.

Replies are listed 'Best First'.
Re^3: HTML stripper...
by kcott (Archbishop) on Nov 22, 2010 at 23:57 UTC

    Firstly, I've added an update to my post, please read that.

    Secondly, rather than just stating "That's bogus ...", perhaps you could cite a reference.

    -- Ken

      perhaps you could cite a reference.
      Rules 91 and 92 of ISO 8879 (SGML).

      Charles F. Goldfarb: The SGML Handbook. Oxford: Oxford University Press. 1990. ISBN 0-19-853737-9. Ch. 10.3, pp 390.