Tag balancing is a feature (requirement) of XML, not HTML. You want an XML module if you are parsing XML, but if you're parsing HTML (as in, the HTML that you may get from an arbitrary site over which you have no control), you need the HTML modules.

If your documents are well-formed, then I would suggest XML::Twig. I use XML::Twig for doing things such as taking HTML tables, copying the header, and reinserting it every 5th or 10th or whatever-th row such that, in long tables, you don't need to go back to the top of the screen to find it. And to alternate background colours on rows (setting the class attribute to "odd" or "even", and letting CSS actually do the colouring).

But, if a user over which you have no control will send you text in a form, you're probably better off assuming that they may not balance their tags.


In reply to Re: sanitizing and balancing by Tanktalus
in thread Pondering Portals by hacker

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.