bbrelin has asked for the wisdom of the Perl Monks concerning the following question:

Hello all, I am trying to use HTML::Scrubber to strip unwanted HTML tags from my data. Unfortunately, it works a little too well and strips hrefs from anchor tags unbidden. Here's an example of before and after output:

<a href="bsc_view.asp?id=217247&amp;DataType=busop">PROJET DE REHABILI +TATION DU PERIMETRE DU BAS MANGOKY</a>

After:

<a>PROJET DE REHABILITATION DU PERIMETRE DU BAS MANGOKY</a>

Following is the test code I wrote:

#!/usr/bin/perl use HTML::Scrubber; $htmlstring = '<a href="bsc_view.asp?id=217247&amp;DataType=busop">PRO +JET DE REHABILITATION DU PERIMETRE DU BAS MANGOKY</a>'; $scrubber = HTML::Scrubber->new(); $scrubber->default(1); $foo = $scrubber->scrub($htmlstring); print "foo = $foo\n";

The documentation says that calling the 'default' method with a param of 1 allows all tags, so why is it nuking the href's? I'm using Ubuntu linux 11.10 and perl version 5.12, if that matters. Any help appreciated...

Replies are listed 'Best First'.
Re: HTML::Scrubber stripping hrefs when I don't want it to
by tangent (Parson) on Mar 12, 2012 at 17:57 UTC
    $scrubber->default(1) allows the tags ('a' etc.) but not the tag's attributes ('href' etc.). To allow the 'href' attribute you need to change it to:
    # allow href $scrubber->default(1,{'href'=> 1}); # or allow all tag attributes $scrubber->default(1,{'*'=> 1});
    and it's more like a doorman than a WMD :)
Re: HTML::Scrubber stripping hrefs when I don't want it to
by moritz (Cardinal) on Mar 12, 2012 at 17:51 UTC
    The documentation says that calling the 'default' method with a param of 1 allows all tags, so why is it nuking the href's?

    because href is an attribute, not a tag.