I would definitely use HTML::TokeParser. I used it for parsing news headlines and it made life so much simpler. This node is the parser that I wrote to dump a page into tokens. Hopefully that will get you going in the direction you're after.
Hope that helps!
Useless trivia: In the 2004 Las Vegas phone book there are approximately 28 pages of ads for massage, but almost 200 for lawyers.