You might try running the truncated text through HTML Tidy. Assuming the original HTML is as clean as you claim it to be, the only problems tidy will need to address are the dangling-tags at the end of the file....
This sounds like great solution, blakem -- just smash the code and let something else clean it up! Laziness as a Virtue. Will try this later today -- thanks.