in reply to Re: Re: Re: Re: Strip HTML tags again
in thread Strip HTML tags again
I first create the string $tagpattern by putting a "|" between all known HTML tags and surrounding the whole thing with parantheses. This will give something like "(a|p|code.....)" and is used later in the subroutine to check for valid HTML tags.use HTML::Tagset; my %tags = %HTML::Tagset::isKnown; my $tagpattern = "(".join('|',keys %tags).")"; print STDERR "$tagpattern\n"; while (<>) { print strip_html_tags($_); } sub strip_html_tags { my $line = shift; $line =~ s/<\s*$tagpattern(?:\s*>|\s+[^>]*>)([^<]*)<\s*\/\1[^>]*>/$2 +/ig; return $line; }
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Re: Re: Re: Re: Re: Strip HTML tags again
by dda (Friar) on Jul 01, 2002 at 13:38 UTC |