Using HTML::TreeBuilder to extract sections spanning multiple tags

Seixon has asked for the wisdom of the Perl Monks concerning the following question:

What I would like to be able to do, is define a section of a page, containing several different _tag's. The HTML in question is:

<table border=0 cellspacing=2 width=100%>
...
<tr>
  <td class=td1 id=centered valign=top><A HREF=http://SOME_LINK>1   </
+A></td>
  <td class=td1 valign=top>
  <script language="javascript">
  var recordLink="<A HREF=SOME_LINK>1   </A>";
  recordLink = recordLink.substring(0, recordLink.indexOf(">"));
  document.write(recordLink);
  document.write(">");
  </script>
  SOME_TEXT</a></td>
<!--
  <td class=td1 valign=top>SOME_DATA</td>
-->
  </tr>
[download]

, and I'm currently reading it with:

my @comments = $tree->look_down ('_tag', '~comment');
my @lnk_details = $tree->look_down ('_tag', 'script');
[download]

The problem with this is that I have no (easy) control over checking whether there actually exist a "comment" part together with the link (contained in the "script"-tag). So I'm reading both the arrays of data together, without knowing whether they have the same number of elements, or even if one element in "comments" is paired with the element in the same position in "lnk_details".

Is there a way to make TreeBuilder extract sections of an HTML-page defined by consecutive tags, and pair those together for further parsing?

Comment on Using HTML::TreeBuilder to extract sections spanning multiple tags Select or Download Code

Replies are listed 'Best First'.
Re: Using HTML::TreeBuilder to extract sections spanning multiple tags by metaperl (Curate) on Feb 03, 2005 at 03:37 UTC
It sounds like you want the `right()` method documented in the HTML::Element module which comes in the HTML::TreeBuilder (of which TreeBuilder is also a part). You might also look in my HTML::Element::Library for the `siblings()` method which is a more general approach to the same problem. Good luck,	[reply] [d/l] [select]
Re^2: Using HTML::TreeBuilder to extract sections spanning multiple tags by Seixon (Initiate) on Feb 03, 2005 at 22:02 UTC
Thanks a lot for taking the time to help me out with such a simple problem!	[reply]