Of course it's important to arrive at the wrong answer as fast as possible :). Most likely, the solutions are all slow because they load the HTML into the DOM, which is slow for large enough HTML files.
On the other hand, I had to look at your output, because I couldn't follow your code for what you want to extract and what not. Your code hides the rules on what to extract quite deep, while the XPath expressions reduce the code mostly to the extraction rules and some boilerplate. Maybe you can keep the speed and gain some expressiveness by using a SAX-based parser like XML::Twig, which is meant for applying downward rules while not loading the whole document.
In reply to Re^3: HTML::Parser fun
by Corion
in thread HTML::Parser fun
by FreakyGreenLeaky
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |