Re: Sanitizing HTML

Some forums don't allow any HTML but then use tags based on square brackets - much like the node linking mechanism used here. Perhaps this would provide an easier approach as you can strip out all HTML very easily by encoding everything you get.

Picking what you want to allow is as simple as doing something like:

[b][/b]
[i][/i]
[link target=""]link text[/link]
[download]

Of course that can add just as much complexity when it comes to testing for nesting, unclosed tags (etc). But I know of a few forums that still stick with this approach. Take a look at some of the open source ones and see how they tackle this problem (phbb comes to mind).

As always YMMV. Parsing and stripping HTML is not a small task but HTML::Parser does it very well so using something based on that shouldn't present a problem.

Comment on Re: Sanitizing HTML Download Code