in reply to sanitizing and balancing
in thread Pondering Portals

Tag balancing is a feature (requirement) of XML, not HTML. You want an XML module if you are parsing XML, but if you're parsing HTML (as in, the HTML that you may get from an arbitrary site over which you have no control), you need the HTML modules.

If your documents are well-formed, then I would suggest XML::Twig. I use XML::Twig for doing things such as taking HTML tables, copying the header, and reinserting it every 5th or 10th or whatever-th row such that, in long tables, you don't need to go back to the top of the screen to find it. And to alternate background colours on rows (setting the class attribute to "odd" or "even", and letting CSS actually do the colouring).

But, if a user over which you have no control will send you text in a form, you're probably better off assuming that they may not balance their tags.