in reply to Custom XML tags in HTML::TreeBuilder and HTML::Element

Reading the docs for HTML::TreeBuilder seem to indicate that you can use $root->ignore_unknown(0); to tell it not to ignore those tags.


___________
Eric Hodges

Replies are listed 'Best First'.
Re^2: Custom XML tags in HTML::TreeBuilder and HTML::Element
by slaniel (Acolyte) on Jul 24, 2006 at 17:04 UTC
    Ah, my mistake. I've been spending all my time reading the HTML::Element docs, and have not spent enough time with ::TreeBuilder. Thanks a bunch, Eric! P.S.: It'd be handy if we could set ::TreeBuilder to allow only certain unknown tags -- in my case, just 'mig:' tags. I'd certainy like to clear Microsoft's 'mso:' tags out, for instance.

      Glimpsing through the source, it doesn't look as if it'd be too hard to patch it so that ignore_unknown could be a coderef instead of a boolean value. Then you could set it to a predicate which looks at the tag name and returns whether or not to ignore it.

      ## ... circa line 152 of HTML/TreeBuilder.pm $self->{'_ignore_unknown'} = sub { 1 }; ## ... circa line 660 in HTML/TreeBuilder.pm if( $self->{ '_ignore_unknown' }->( $tag ) ) { print $indent, " * Ignoring unknown tag \U$tag\E\n" if DEBUG $self->warning("Skipping unknown tag $tag"); return } ## ... later in your code $tree->ignore_unknown( sub { return 1 if $_[0] !~ /^mig:/ } );

      As is you'd have to make your own copy and/or edit the installed version instead of overriding in a subclass. But still, easily do-able (and if you get it working right submit a patch :).