Some forums don't allow any HTML but then use tags based on square brackets - much like the node linking mechanism used here. Perhaps this would provide an easier approach as you can strip out all HTML very easily by encoding everything you get.

Picking what you want to allow is as simple as doing something like:
[b][/b] [i][/i] [link target=""]link text[/link]
Of course that can add just as much complexity when it comes to testing for nesting, unclosed tags (etc). But I know of a few forums that still stick with this approach. Take a look at some of the open source ones and see how they tackle this problem (phbb comes to mind).

As always YMMV. Parsing and stripping HTML is not a small task but HTML::Parser does it very well so using something based on that shouldn't present a problem.

In reply to Re: Sanitizing HTML by simon.proctor
in thread Sanitizing HTML by skx

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.