HTML Tidy/HTML::TreeBuilder is a powerful combination in these cases.
In reply to Re^3: Parsing badly formed HTML by wfsp in thread Parsing badly formed HTML by SilasTheMonk