HTML::Treebuilder is a good answer. It is pretty tolerant of missing close tags and can generate nice HTML output if you ask it nicely. You may also be interested in HTML::Lint which parses HTML and generates an error report.
use strict; use warnings; use HTML::TreeBuilder; use HTML::Lint; my $html = do {local $/; (<DATA>)}; my $lint = HTML::Lint->new (only_types => HTML::Lint::Error::STRUCTURE +); $lint->parse ($html); $lint->eof (); print "HTML::Lint report:\n"; print join "\n", map {$_->as_string ()} $lint->errors (); my $tree = HTML::TreeBuilder->new (); $tree->parse ($html); $tree->eof (); print "\n\nTreeBuilder cleaned up HTML\n"; print $tree->as_HTML (); __DATA__ <p><b><i>test</b></p>
Prints:
HTML::Lint report: (1:14) <i> at (1:7) is never closed (1:18) <body> tag is required (1:18) <head> tag is required (1:18) <html> tag is required (1:18) <title> tag is required TreeBuilder cleaned up HTML <html><head></head><body><p><b><i>test</i></b></body></html>
In reply to Re: Ensuring HTML is "balanced"
by GrandFather
in thread Ensuring HTML is "balanced"
by skx
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |