in reply to Re: Custom XML tags in HTML::TreeBuilder and HTML::Element
in thread Custom XML tags in HTML::TreeBuilder and HTML::Element

Ah, my mistake. I've been spending all my time reading the HTML::Element docs, and have not spent enough time with ::TreeBuilder. Thanks a bunch, Eric! P.S.: It'd be handy if we could set ::TreeBuilder to allow only certain unknown tags -- in my case, just 'mig:' tags. I'd certainy like to clear Microsoft's 'mso:' tags out, for instance.
  • Comment on Re^2: Custom XML tags in HTML::TreeBuilder and HTML::Element

Replies are listed 'Best First'.
Re^3: Custom XML tags in HTML::TreeBuilder and HTML::Element
by Fletch (Bishop) on Jul 24, 2006 at 17:44 UTC

    Glimpsing through the source, it doesn't look as if it'd be too hard to patch it so that ignore_unknown could be a coderef instead of a boolean value. Then you could set it to a predicate which looks at the tag name and returns whether or not to ignore it.

    ## ... circa line 152 of HTML/TreeBuilder.pm $self->{'_ignore_unknown'} = sub { 1 }; ## ... circa line 660 in HTML/TreeBuilder.pm if( $self->{ '_ignore_unknown' }->( $tag ) ) { print $indent, " * Ignoring unknown tag \U$tag\E\n" if DEBUG $self->warning("Skipping unknown tag $tag"); return } ## ... later in your code $tree->ignore_unknown( sub { return 1 if $_[0] !~ /^mig:/ } );

    As is you'd have to make your own copy and/or edit the installed version instead of overriding in a subclass. But still, easily do-able (and if you get it working right submit a patch :).