Use HTML::Tidy (Re: substr(ingifying) htmlized text)

The way I'd do it is to parse out the first ~1000 characters, tags and all, from the page, execute an s/<[^>]*$// to remove any truncated tag at the end, and then feed the result to HTML::Tidy.

It should close any open tags and give you back a shiny, happy, valid HTML document fragment.

2005-09-27 Retitled by g0n, as per Monastery guidelines
Original title: 'HTML::Tidy'

Comment on Use HTML::Tidy (Re: substr(ingifying) htmlized text)