Seixon has asked for the wisdom of the Perl Monks concerning the following question:

What I would like to be able to do, is define a section of a page, containing several different _tag's. The HTML in question is:
<table border=0 cellspacing=2 width=100%> ... <tr> <td class=td1 id=centered valign=top><A HREF=http://SOME_LINK>1 </ +A></td> <td class=td1 valign=top> <script language="javascript"> var recordLink="<A HREF=SOME_LINK>1 </A>"; recordLink = recordLink.substring(0, recordLink.indexOf(">")); document.write(recordLink); document.write(">"); </script> SOME_TEXT</a></td> <!-- <td class=td1 valign=top>SOME_DATA</td> --> </tr>
, and I'm currently reading it with:
my @comments = $tree->look_down ('_tag', '~comment'); my @lnk_details = $tree->look_down ('_tag', 'script');
The problem with this is that I have no (easy) control over checking whether there actually exist a "comment" part together with the link (contained in the "script"-tag). So I'm reading both the arrays of data together, without knowing whether they have the same number of elements, or even if one element in "comments" is paired with the element in the same position in "lnk_details".

Is there a way to make TreeBuilder extract sections of an HTML-page defined by consecutive tags, and pair those together for further parsing?

Replies are listed 'Best First'.
Re: Using HTML::TreeBuilder to extract sections spanning multiple tags
by metaperl (Curate) on Feb 03, 2005 at 03:37 UTC
    It sounds like you want the right() method documented in the HTML::Element module which comes in the HTML::TreeBuilder (of which TreeBuilder is also a part).

    You might also look in my HTML::Element::Library for the siblings() method which is a more general approach to the same problem.

    Good luck,

      Thanks a lot for taking the time to help me out with such a simple problem!