in reply to substr(ingifying) htmlized text
The way I'd do it is to parse out the first ~1000 characters, tags and all, from the page, execute an s/<[^>]*$// to remove any truncated tag at the end, and then feed the result to HTML::Tidy.
It should close any open tags and give you back a shiny, happy, valid HTML document fragment.
2005-09-27 Retitled by g0n, as per Monastery guidelines
Original title: 'HTML::Tidy'
|
|---|