Having had a good hard look at the docs, I figured it out using HTML::Parser directly, and (as you said) adding all the handlers, and reconstructing the appropriate parts of the document.
I know I'd be better off with XML, but this was a one-off conversion from the HTML docs into something more 'edible'. Next time I have something like this, I'll try TokeParser.