BUU has asked for the wisdom of the Perl Monks concerning the following question:

Basically what im trying to do is find and replace all the "nonvalid tags" in an html document. By nonvalid i mean something like <q></q> <html></html> are valid and stuff like <foo></foo> <baz></baz> <tmpl></tmpl> are non valid tags i would replace. Is there any module that would do this, or do i really have to go write my one?

Replies are listed 'Best First'.
Re: Finding and replacing 'nonvalid' html tags?
by gav^ (Curate) on Jun 04, 2002 at 21:20 UTC
    You might want to investigate HTML::TagFilter which makes this pretty easy.

    Or simply use HTML::Parser, writing a handler for start/end tags and only outputting ones you want (examples are here).

    Update:
    If you wanted a list of valid HTML tags, you could look at HTML::Tagset and %HTML::Tagset::isKnown

    gav^

    A reply falls below the community's threshold of quality. You may see it by logging in.