I've only used Perl for 2 weeks, so apologies for my likely ignorance.
I want to write a script that downloads a webpage and renders part of that page in wiki format. I appreciate that there is a HTML:WikiConverter module, but I would like to implement this myself, partly because I only want to render some elements of the html. I will be using HTML::Tree.
The first step is to build the tree. That appears straightforward:
#!/usr/bin/perl -w use HTML::Tree; use LWP::Simple; use strict; getstore ("http://www.guardian.co.uk", "guardian.htm") or die "Cannot +get the page.\n"; my $tree = HTML::TreeBuilder->new(); $tree = parse_file("guardian.htm);
In pseudo (pseudo) code I wish to look at each element of the page. For each element, if the tag is one I'm interested in, then I wish to take the text of the element and render it to wiki format.
I just don't understand how to loop through all the elements. A discussion in the HTML::Tree documentation suggests a recursive method of accessing all the elements:
But I don't understand this code and can't adapt it.{ my $counter = 'x0000'; sub give_id { my $x = $_[0]; $x->attr('id', $counter++) unless defined $x->attr('id'); foreach my $c ($x->content_list) { give_id($c) if ref $c; # ignore text nodes } }; give_id($start_node); }
Once I have a 'loop' method of looking at each element I propose processing them like this:
if $element->teg('h1' or 'h2') { my $content = $element->as_text(); print outfile "====$content====\n"; }
I will have several elsif statements doing something similar with other tags.
My question then is how can write a loop that allows me to look at each element in the tree. (The traverse method is deprecated.)
In reply to Inspecting each element in a tree, specifically HTML::Tree by hulot
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |