Not only will HTML::TreeBuilder do fine, but if it's an HTML file an XML parser is likely to die quickly on it. XML parsers are required to fail on invalid XML, while HTML parsers are allowed to be more forgiving (e.g. HTML::TreeBuilder defaults to inserting implicit end tags that would cause an XML parser to quit)
Comment on Re^3: How to extract text between two tags?