HTML::TagFilter does this admirably. It's a subclass of HTML::Parser and allows you to specify what tags/attributes to allow/deny similarly to what you're doing. You'd probably need to tweak this a little to fit into your code the way you want, but it should do the trick.
use HTML::TagFilter; my $tf = HTML::TagFilter->new( allow=>{ p=>{'any'}, i=>{'any'}, b=>{'any'}, code=>{'any'}, br=>{'any'}, u=>{'any'}, pre=>{'any'}, img=>{width=>['any'], height=>['any'], border=>['any'], src=>['any'], }, a=>{href=>['any'], target=>['any'], name=>['any'], }, }, deny=>{}, log_rejects => 1, strip_comments => 1, ); sub filter_html{ $tf->filter(shift); }
Update: This module will freak out if you try to install/use it on anything earlier than perl 5.6, I believe because it uses Warnings. As another monk pointed out (forgot who, it was a while ago), you can just comment this out (or install it, I suppose) and it'll work fine.
-Any sufficiently advanced technology is
indistinguishable from doubletalk.
In reply to Re: Safe HTML output?
by Hero Zzyzzx
in thread Safe HTML output?
by gav^
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |