in reply to Pondering Portals

HTML::Scrubber uses HTML::Parser under the hood, and makes it extremely simple to allow/disallow tags. As tanktalus says above, you will be better off choosing what tags to allow.

From the pod:

#!/usr/bin/perl -w use HTML::Scrubber; use strict; my $html = q[ <style type="text/css"> BAD { background: #666; color: #666;} </style> <script language="javascript"> alert("Hello, I am EVIL!"); </script> <HR> a => <a href=1>link </a> br => <br> b => <B> bold </B> u => <U> UNDERLINE </U> ]; my $scrubber = HTML::Scrubber->new( allow => [ qw[ p b i u hr br ] ] ) +; print $scrubber->scrub($html); $scrubber->deny( qw[ p b i u hr br ] ); print $scrubber->scrub($html); __END__ Output: <hr> a =&gt; link br =&gt; <br> b =&gt; <b> bold </b> u =&gt; <u> UNDERLINE </u> a =&gt; link br =&gt; b =&gt; bold u =&gt; UNDERLINE

Replies are listed 'Best First'.
Re^2: Pondering Portals
by skx (Parson) on Apr 30, 2005 at 15:25 UTC

    I use HTML::Scrubber on one of my sites, the only problem I have with it (which I was vaguely thinking of posting as a new question only yesterday) is that I see no way to enforce attribute inclusion.

    Say the user submits:

    <a href="http://example.com">text</a>

    I would like to automatically insert, or mandate, the xrel="nofollow" attribute and value - I can't see a simple way of doing this short of re-using the HTML::Parser, or a fragile regexp.

    That's the only shortcoming I see with HTML::Scrubber.

    Steve
    ---
    steve.org.uk
      Have you considered subclassing HTML::Scrubber? Below, I inject the xrel attribute into each anchor before validation.

      $ cat XREL.pm package XREL; use strict; use base 'HTML::Scrubber'; sub _validate { my ($self, $t, $r, $a, $as) = @_; if ( $t eq 'a' ) { $$a{ rel } = 'nofollow'; push @$as, 'rel' unless grep { /rel/ } @$as; } $self->SUPER::_validate( $t, $r, $a, $as ); } 1;
      $ cat scrub.pl #!/usr/bin/perl use warnings; use strict; use XREL; my $scrubber = XREL->new( allow => [ qw[ a p b i u hr br ] ] ); $scrubber->rules( a => { href => 1, rel => qr/^nofollow$/i, '*' => 0, } ); my $html = q[<a href="http://perlmonks.org">link </a>]; print $scrubber->scrub($html), $/; $html = q[<a href="http://perlmonks.org" rel="nofollow">link </a>]; print $scrubber->scrub($html), $/; $html = q[<a href="http://perlmonks.org" rel="xxx">link </a>]; print $scrubber->scrub($html), $/; $html = q[<a href="http://perlmonks.org" rel="xnofollow">link </a>]; print $scrubber->scrub($html), $/; __END__ output: <a href="http://perlmonks.org" rel="nofollow">link </a> <a href="http://perlmonks.org" rel="nofollow">link </a> <a href="http://perlmonks.org" rel="nofollow">link </a> <a href="http://perlmonks.org" rel="nofollow">link </a>

      update:changed xrel="nofollow" to rel="nofollow"

        Perfect ++

        I admit I wasn't too sure where to start, though I'd made attempts at hacking the original module to allow 'mandatory' tags.

        Steve
        ---
        steve.org.uk