in reply to Truncating HTML early
Have you considered using RSS? It's a totally different approach to your problem - instead of extracting data from the HTML, and then try to clean it up, RSS could allow you to build quick, clean overviews of the pages, and place them in various "channels" on a front page. Perl.com has several good articles on RSS. There is a good one at http://www.perl.com/pub/a/2001/11/15/creatingrss.html.
Cheers,
|
|---|