in reply to Problem timing out XML::LibXML parsing

Can HTML::Parser deal with this code? And do you really need to use XML::LibXML? If the answers are yes and no you can use HTML::TreeBuilder (and HTML::TreeBuilder::XPath for very usefull XPath support. Or use XML::Twig, which uses HTML::TreeBuilder to wrestle XML out of the HTML.

Othersiwe you could use HTML::Tidy, or just plain tidy, to clean-up the HTML before using it.

IIRC, the I was looking for a way to convert HTML to XML, HTML::TreeBuilder seemed to be the most robust parser available in Perl.

  • Comment on Re: Problem timing out XML::LibXML parsing

Replies are listed 'Best First'.
Re^2: Problem timing out XML::LibXML parsing
by samtregar (Abbot) on Feb 03, 2009 at 20:12 UTC
    Thanks, but yes, I really want to use XML::LibXML. It's so much faster than HTML::TreeBuilder and speed is critical in my application. So far it's actually been pretty reliable - this problem only occurs in around 1 out of every 100,000 or so pages I've parsed.

    -sam