My favourite is
HTML::TokeParser::Simple, where you get a token (tag, piece of text, comment...) at a time, like you read a text file one line at a time; and each token is an object, which eases unified access. For example,
$token->as_is returns the original token's text, whatever kind of token it is.