Maybe you should be using XML::TokeParser or XML::Twig or at any rate, some XML parser. Since your input is XML. Html is always xml, but not the other way around. And xhtml is xml, not html.
Comment on Re: HTML::TokeParser not stripping entities and xhtml