I have not used it extensively, but another module that looks really neat for parsing and "tidying" HTML is
Marpa-HTML. Their
html_fmt demo does handling of missing start and end tags, and the dist's documentation talks about being able to selectively eliminate certain types of tag.