You need, at minimum, a HTML::Parser. But, if the HTML qualifies as XHTML, you might be able to use "XPath expressions" (XML::LibXML) to directly extract the portions of interest without writing any procedural code at all. Perl has many options ready-built for you ... consider all of them before writing code.