in reply to Ensuring HTML is "balanced"
HTML::Treebuilder is a good answer. It is pretty tolerant of missing close tags and can generate nice HTML output if you ask it nicely. You may also be interested in HTML::Lint which parses HTML and generates an error report.
use strict; use warnings; use HTML::TreeBuilder; use HTML::Lint; my $html = do {local $/; (<DATA>)}; my $lint = HTML::Lint->new (only_types => HTML::Lint::Error::STRUCTURE +); $lint->parse ($html); $lint->eof (); print "HTML::Lint report:\n"; print join "\n", map {$_->as_string ()} $lint->errors (); my $tree = HTML::TreeBuilder->new (); $tree->parse ($html); $tree->eof (); print "\n\nTreeBuilder cleaned up HTML\n"; print $tree->as_HTML (); __DATA__ <p><b><i>test</b></p>
Prints:
HTML::Lint report: (1:14) <i> at (1:7) is never closed (1:18) <body> tag is required (1:18) <head> tag is required (1:18) <html> tag is required (1:18) <title> tag is required TreeBuilder cleaned up HTML <html><head></head><body><p><b><i>test</i></b></body></html>
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Ensuring HTML is "balanced"
by Anonymous Monk on Mar 25, 2008 at 19:03 UTC | |
by GrandFather (Saint) on Mar 26, 2008 at 01:45 UTC | |
by ww (Archbishop) on Mar 26, 2008 at 00:47 UTC |