in reply to Remove all html tag Except 'sup'

A single regex is a bad way for the general case, but since you asked for it, I'll try:
my $tag = qr{ <(?>/?) # tag start (?!sup) # not a <sup> or </sup> tag [^>]* # everything but the tag end+ > # end of tag }xi; $str =~ s/$tag//g;

This is untested and probably a bad idea, but you asked for it ;-)

Update: fixed regex to preserve closing tag. Stupid me. It tried to match <c/sup</c>, failed, backtracked, and matched that whole substring with the [>]* rule. Non-backtracking groups around /? prevents that. In perl 5.10 you could also say /?+ instead.

Replies are listed 'Best First'.
Re^2: Remove all html tag Except 'sup'
by jai_dgl (Beadle) on Jun 20, 2008 at 10:09 UTC
    Hi its working fine , but not preserving end of sup
      Its missing a grouping parens
      # you need extra (?:) $tag = qr{</?(?:(?!sup)[^>])*>}i;
        That will also keep <supfoo> and <foosup>. A partial improvement may be
        $tag = qr{</?(?:(?![< ]sup[ >])[^>])*>}i;
        It works, however I'm not terribly sure about why it works...
      You're right, I updated my regex - should work now.
        It works, but will miss some like < sup > </ sup >, if you add grouping like this it will get them all :)