in reply to Traversing an HTMLTree with HTML:Element ->right

Update: This is all wrong. :-( I misread the HTML.

See my second attempt.

This might help illustrate "siblings".

#!/usr/bin/perl use warnings; use strict; use lib q{/www/lib}; use SW::Debug; use HTML::TreeBuilder; my $content = do{local $/;<DATA>}; my $t = HTML::TreeBuilder->new_from_content($content) or die qq{cant build tree}; my $body = $t->look_down(_tag => q{body}); my $p = $t->look_down(_tag => q{p}); my @right = $p->right; print scalar @right, qq{ siblings\n}; for my $ele (@right){ $ele->dump; print q{-} x 20, qq{\n}; } __DATA__ <p id = "1">one</p> <p id = "2">two</p> <p id = "3">three</p>
2 siblings <p id="2"> @0.1.1 "two" ---------- <p id="3"> @0.1.2 "three" ----------
Which I don't think is what you're after. In the
foreach my $Node (@PageSections)
loop you want to look down for divs with class = "vcard" and then look down into each of those to get the data you need.

The "vcard" divs are children of the "secTitle" divs rather then siblings.

HTH

Replies are listed 'Best First'.
Re^2: Traversing an HTMLTree with HTML:Element ->right
by wfsp (Abbot) on Jun 30, 2009 at 09:36 UTC
    Here's another go. :-)
    #!/usr/bin/perl use warnings; use strict; use HTML::TreeBuilder; my $t = HTML::TreeBuilder->new_from_file(q{html/monk.html}) or die qq{cant build tree}; my $body = $t->look_down(_tag => q{body}); my @PageSections = $body->look_down( _tag => q{div}, class => q{secTit +le}); my $i; for my $node (@PageSections){ my $secTitle = $node->as_text; print qq{>>>>> $secTitle\n}; my @right = $node->right; for my $ele (@right){ if (ref $ele){ last if( $ele->tag eq q{div} and $ele->attr(q{class}) and $ele->attr(q{class}) eq q{secTitle} ); $ele->dump; } else{ print $ele, qq{\n}; } } print q{-} x 20, qq{\n}; }
      Re: The "vcard" divs are children of the "secTitle" divs rather then siblings.

      Oh how I wish that were true.
      Here is a a snippet of the Husband portion of the page.
      I think my problem is that I my iteration with ->right() assumes the ->content array of the parent node contains only references to HTML::Element objects. This assumption is wrong. The ->content array of an HTML::Element is a mix of text scalars and references. As your code (with the use of if ref() demonstrates). That this is not the case is demonstrated by this HTML snippet the test page
      <span class='BogusClasNameForThisExample'> Father: <a href="../F247/F247134.htm"> John Leonard </a> Mother: <a href="../F247/F247134.htm"> Sarah Leonard </a> </span>
      There are 4 elements in the ->content array of the span node:
      • Scalar text: Father
      • HTML::Element for <a href="../F247/F247134.htm">
      • Scalar text: Mother
      • HTML::Element for <a href="../F247/F247134.htm">
      I think I need to use the ->objectify_text() and \->deobjectify_text(). This should pack up the text as HTML::Elements so iterating with the ->right() function should then work.

      I will let you know the results when I get home tonight.