in reply to Re^2: HTML::Parser fun
in thread HTML::Parser fun
I've had a gander at XML::LibXML but cannot see how to code it to be real-world HTML tolerant (so I can test it and see how tolerant it is).
You can't. At least not in Perl. XML::LibXML uses libxml2, which does the XML, and HTML, parsing. That's what you would need to change.
For the record, when I wanted to add HTML parsing to XML::Twig, I looked at HTML::Parser, XML::LibXML and tidy, and settled on HTML::Parser as the most robust and easy to use solution to get well-formed XML out of random HTML.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: HTML::Parser fun
by FreakyGreenLeaky (Sexton) on Jun 05, 2008 at 15:11 UTC |