The short answer is, probably not with HTML::Treebuilder, but maybe with a more general SGML parser.
The longer answer is that, what you're parsing is not, strinctly speaking, valid HTML. Header tags have not been allowed to be nested in *any* version of HTML, *ever*, not even in the rediculous horrible terrible aweful unparseable messy HTML of the Netscape 3-4 era, and *certainly* not in any vaguely recent W3C specification. Consequently, an HTML parser is very unlikely to preserve such a construct. Frankly, if it did, I would call that a bug.
It *is* possible to rig up a parser than *can* preserve such things, but it would probably have to be based on a general SGML parser, rather than something HTML-specific since, as noted, what you're parsing isn't technically HTML. And it raises the question of why you would *want* to preserve nested header tags. If it were me, I would want that sort of thing to go away, fast.
In reply to Re: HTML::TreeBuilder, nesting with header vs font, different behavior
by jonadab
in thread HTML::TreeBuilder, nesting with header vs font, different behavior
by tphyahoo
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |