Keep in mind that the guts of HTML::Parser is written in C. Very fast C. If I recall, the speed of the C version was something like 10 times the speed of the regex version, so I doubt you can hand-roll a regex parser even for a subset of the problem that can beat the C version now.