Finding and replacing 'nonvalid' html tags?

BUU has asked for the wisdom of the Perl Monks concerning the following question:

Basically what im trying to do is find and replace all the "nonvalid tags" in an html document. By nonvalid i mean something like <q></q> <html></html> are valid and stuff like <foo></foo> <baz></baz> <tmpl></tmpl> are non valid tags i would replace. Is there any module that would do this, or do i really have to go write my one?

Comment on Finding and replacing 'nonvalid' html tags? Select or Download Code

Replies are listed 'Best First'.
Re: Finding and replacing 'nonvalid' html tags? by gav^ (Curate) on Jun 04, 2002 at 21:20 UTC
You might want to investigate HTML::TagFilter which makes this pretty easy. Or simply use HTML::Parser, writing a handler for start/end tags and only outputting ones you want (examples are here). Update: If you wanted a list of valid HTML tags, you could look at HTML::Tagset and `%HTML::Tagset::isKnown` gav^	[reply] [d/l]
A reply falls below the community's threshold of quality. You may see it by logging in.