Re: Parser for Html

Another suggestion would XML::LibXML. It has a mode for parsing HTML; once you have the document parsed, you can go at it with XPath and do all the other things that can be done with XML. It won’t cope well if your HTML input is more than moderately broken tagsoup, however; personally, I’d use HTML::Tidy in that case so I can stick with XML::LibXML anyway, but you may prefer HTML::Parser or one of its derived modules in that case. In that case, HTML::TokeParser::Simple is probably your best bet.

Makeshifts last the longest.

Comment on Re: Parser for Html