davies has asked for the wisdom of the Perl Monks concerning the following question:
X: I am writing something that uses Dancer2 and Template::Toolkit to produce HTML. I want to write tests along the lines of "is the fourth column of the third row of the second table what I expect?".
Y: I am using HTML::TreeBuilder. The documentation for this is sending me off on a yak shaving exercise that is getting me frustrated. So:
While you could access the content of a tree by writing code that says "access the 'src' attribute of the root's first child's seventh child's third child", you're more likely to have to scan the contents of a tree, looking for whatever nodes, or kinds of nodes, you want to do something with. The most straightforward way to look over a tree is to "traverse" it; an HTML::Element method ($h->traverse) is provided for this purposelooks helpful. Until you read the relevant section, which says:
Lengthy discussion of HTML::Element's unnecessary and confusing traverse method has been moved to a separate file: HTML::Element::traverse
So I go there, to find this:
or you can just be simple and clear (and not have to understand the calling format for traverse) by writing a sub that traverses the tree by just calling itself:See, isn't that nice and clear?{ my $counter = 'x0000'; sub give_id { my $x = $_[0]; $x->attr('id', $counter++) unless defined $x->attr('id'); foreach my $c ($x->content_list) { give_id($c) if ref $c; # ignore text nodes } }; give_id($start_node); }
No, it's foul and opaque. I can't see what the purpose of it is, nor can I see how it achieves its purpose nor can I see how to hack this to give me table 2, row 3, column 4. I do have some working code that looks like:
my $tree = HTML::TreeBuilder->new_from_content($html); $tree->elementify(); my $tagmap = $tree->tagname_map(); ok('Matrix' eq $$tagmap{'h2'}[1]{'_content'}[0], "Got correct title (M +atrix)");
but every time I look at one of these elements, I get things like '_parent' => $VAR1->{'_parent'}{'_parent'}{'_parent'}{'_parent'}{'_parent'},, leading me to suspect that every element object contains a reworked copy of the entire HTML tree and getting me no closer to what I want.
As I have indicated, I've tried reading the docs but have found them unhelpful. I've looked for external tutorials without great success (although I've seen lots of suggestions for using other modules, investigation of which has also cost time without making progress). Am I using a sensible tool (and if not, what should I use)? How should I be using it to do something I did not expect to be difficult?
Regards,
John Davies
Update: a combination of the answers with some help from Berends has got me working. The CSS and other definitions I had brought in from Bootstrap used <meta ...> tags that don't play nicely with XML. However, a simple change to <meta .../> (and the same for link tags) was enough for the XML parser to play nicely with me. shmem's advice to use XPath also works well, so now my code looks like:(snippage hath occurred; I may have blundered) and all tests pass with indexing done in the XPath. Thanks all.func textfromxml($xp, $xpath) { my $nodeset = $xp->find($xpath); my ($text) = XML::XPath::XMLParser::as_string(($nodeset->get_nodel +ist)[0]) =~ />(.*)</; return $text; } use XML::XPath; use XML::XPath::XMLParser; my $xp = XML::XPath->new(xml => $html); my $text = textfromxml($xp, '/html/body/div/div/h2[2]'); ok('Matrix' eq $text, "Got correct title (Matrix)");
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Testing generated HTML
by choroba (Cardinal) on Feb 21, 2016 at 17:16 UTC | |
by davies (Monsignor) on Feb 21, 2016 at 19:21 UTC | |
by choroba (Cardinal) on Feb 21, 2016 at 20:09 UTC | |
by Myrddin Wyllt (Hermit) on Feb 22, 2016 at 13:44 UTC | |
by davies (Monsignor) on Feb 22, 2016 at 14:01 UTC | |
| |
Re: Testing generated HTML
by shmem (Chancellor) on Feb 21, 2016 at 21:06 UTC |