K, now I'm confused. What I need to do is to take an HTML page/file parse it. Take the structure and regurgitate it in a string output to an html or text file. I'll just use HTML. So it would appear in the structure that an xml file appears in with respect to the root node, elements, attributes and so forth. Visually, with the inherent relationships.
Then I believe you'll have to use the HTML-Tree cpan module which parses a html document to a tree object. You'll also need some xml module if you want to write an xml output.
Alternately, just use some xhtml tidier, pretending the html is bad xhtml.