If I understand correctly what you're after, you might want to look at
HTML::TokeParser to do what you're after. Parsing HTML in to tokens will give you all the bold, italics and color information and you should be able to take it from there.
Don't try to parse the HTML yourself, you'll drive yourself nuts.
HTH!
Useless trivia: In the 2004 Las Vegas phone book there are approximately 28 pages of ads for massage, but almost 200 for lawyers.