in reply to Extracting paragraphs from html

As you noticed parsing HTML gets messy/tricky with regex when the tags change all the time.

You might want to look at HTML::TokeParser::Simple

-SK