I've had a gander at XML::LibXML but cannot see how to code it to be real-world HTML tolerant (so I can test it and see how tolerant it is).
You can't. At least not in Perl. XML::LibXML uses libxml2, which does the XML, and HTML, parsing. That's what you would need to change.
For the record, when I wanted to add HTML parsing to XML::Twig, I looked at HTML::Parser, XML::LibXML and tidy, and settled on HTML::Parser as the most robust and easy to use solution to get well-formed XML out of random HTML.
In reply to Re^3: HTML::Parser fun
by mirod
in thread HTML::Parser fun
by FreakyGreenLeaky
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |