in reply to Re: Traversing an HTMLTree with HTML:Element ->right
in thread Traversing an HTMLTree with HTML:Element ->right

Here's another go. :-)
#!/usr/bin/perl use warnings; use strict; use HTML::TreeBuilder; my $t = HTML::TreeBuilder->new_from_file(q{html/monk.html}) or die qq{cant build tree}; my $body = $t->look_down(_tag => q{body}); my @PageSections = $body->look_down( _tag => q{div}, class => q{secTit +le}); my $i; for my $node (@PageSections){ my $secTitle = $node->as_text; print qq{>>>>> $secTitle\n}; my @right = $node->right; for my $ele (@right){ if (ref $ele){ last if( $ele->tag eq q{div} and $ele->attr(q{class}) and $ele->attr(q{class}) eq q{secTitle} ); $ele->dump; } else{ print $ele, qq{\n}; } } print q{-} x 20, qq{\n}; }
>>>>> HUSBAND <div class="vcard" id="hcard-Josiah-Leonard" style="display:inline"> @ +0.1.0.0.0.0.4.3.0.0.0.3.2.3 " \x0d\x0a\x09Name: " <span class="fn n"> @0.1.0.0.0.0.4.3.0.0.0.3.2.3.1 <a href="../F247/F247134.htm"> @0.1.0.0.0.0.4.3.0.0.0.3.2.3.1.0 "Josiah Leonard" <span style="display:none"> @0.1.0.0.0.0.4.3.0.0.0.3.2.3.2 <span class="x-gender"> @0.1.0.0.0.0.4.3.0.0.0.3.2.3.2.0 "Male" " " <a href="#Note1"> @0.1.0.0.0.0.4.3.0.0.0.3.2.3.4 "Note" "\x0d\x0a Born: \x0d\x0a Married: 2 Nov 1699" <span style="display: none"> @0.1.0.0.0.0.4.3.0.0.0.3.2.3.6 <span class="x-marriage-date"> @0.1.0.0.0.0.4.3.0.0.0.3.2.3.6.0 "1699-11-2" " at Bridgewater, Plymouth, MA" <span style="display: none"> @0.1.0.0.0.0.4.3.0.0.0.3.2.3.8 <span class="x-marriage-location"> @0.1.0.0.0.0.4.3.0.0.0.3.2.3.8. +0 "Bridgewater, Plymouth, MA" "\x0d\x0a Died: Abt 1745" <span style="display: none"> @0.1.0.0.0.0.4.3.0.0.0.3.2.3.10 <span class="x-death-date"> @0.1.0.0.0.0.4.3.0.0.0.3.2.3.10.0 "1745-1-1" " at Bridgewater, Plymouth, MA" <span style="display: none"> @0.1.0.0.0.0.4.3.0.0.0.3.2.3.12 <span class="x-death-location"> @0.1.0.0.0.0.4.3.0.0.0.3.2.3.12.0 "Bridgewater, Plymouth, MA" "\x0d\x0a" Other Spouses: <a href="../F248/F248346.htm"> @0.1.0.0.0.0.4.3.0.0.0.3.2.5 "Abigail Washburn" <br /> @0.1.0.0.0.0.4.3.0.0.0.3.2.6 Father: <a href="../F247/F247134.htm"> @0.1.0.0.0.0.4.3.0.0.0.3.2.8 "John Leonard" Mother: <a href="../F247/F247134.htm"> @0.1.0.0.0.0.4.3.0.0.0.3.2.10 "Sarah Leonard" <br /> @0.1.0.0.0.0.4.3.0.0.0.3.2.12 -------------------- >>>>> WIFE <div class="vcard" id="hcard-Marjoram-Washburn" style="display:inline" +> @0.1.0.0.0.0.4.3.0.0.0.3.2.16 " Name: " <span class="fn n"> @0.1.0.0.0.0.4.3.0.0.0.3.2.16.1 <a href="../F248/F248872.htm"> @0.1.0.0.0.0.4.3.0.0.0.3.2.16.1.0 "Marjoram Washburn" " " <a href="#Note2"> @0.1.0.0.0.0.4.3.0.0.0.3.2.16.3 "Note" Born: Died: Bef 21 Nov 1717 Father: <a href="../F248/F248872.htm"> @0.1.0.0.0.0.4.3.0.0.0.3.2.18 "Philip Washburn" Mother: <a href="../F248/F248872.htm"> @0.1.0.0.0.0.4.3.0.0.0.3.2.20 "Elizabeth Irish" <br /> @0.1.0.0.0.0.4.3.0.0.0.3.2.22 -------------------- >>>>> CHILDREN <div class="vcard" id="hcard-John-Leonard" style="display:inline"> @0. +1.0.0.0.0.4.3.0.0.0.3.2.25 " Name: " <span class="fn n"> @0.1.0.0.0.0.4.3.0.0.0.3.2.25.1 <a href="../F236/F236503.htm"> @0.1.0.0.0.0.4.3.0.0.0.3.2.25.1.0 "John Leonard" Born: Bet 1695 and 1723 Died: Bet 1748 and 1808 Wife: <a href="../F236/F236503.htm"> @0.1.0.0.0.0.4.3.0.0.0.3.2.27 "Anna Noble" <br /> @0.1.0.0.0.0.4.3.0.0.0.3.2.29 <div class="vcard" id="hcard-Elizabeth-Leonard" style="display:inline" +> @0.1.0.0.0.0.4.3.0.0.0.3.2.31 " Name: " <span class="fn n"> @0.1.0.0.0.0.4.3.0.0.0.3.2.31.1 <a href="../F248/F248348.htm"> @0.1.0.0.0.0.4.3.0.0.0.3.2.31.1.0 "Elizabeth Leonard" Born: Abt 1702 Died: 14 Oct 1783 at Bridgewater, Plymouth, MA Husband: <a href="../F248/F248348.htm"> @0.1.0.0.0.0.4.3.0.0.0.3.2.33 "James Washburn" <br /> @0.1.0.0.0.0.4.3.0.0.0.3.2.35 <div class="vcard" id="hcard-Josiah-Leonard" style="display:inline"> @ +0.1.0.0.0.0.4.3.0.0.0.3.2.37 " Name: " <span class="fn n"> @0.1.0.0.0.0.4.3.0.0.0.3.2.37.1 "Josiah Leonard" Born: Abt 1704 Died: <br /> @0.1.0.0.0.0.4.3.0.0.0.3.2.39 <div class="vcard" id="hcard-Mary-Leonard" style="display:inline"> @0. +1.0.0.0.0.4.3.0.0.0.3.2.41 " Name: " <span class="fn n"> @0.1.0.0.0.0.4.3.0.0.0.3.2.41.1 <a href="../F248/F248350.htm"> @0.1.0.0.0.0.4.3.0.0.0.3.2.41.1.0 "Mary Leonard" Born: Bef 1710 Died: 1793 at Marlborough, MA Husband: <a href="../F248/F248350.htm"> @0.1.0.0.0.0.4.3.0.0.0.3.2.43 "Daniel Herrington" <br /> @0.1.0.0.0.0.4.3.0.0.0.3.2.45 <div class="vcard" id="hcard-Margene-Leonard" style="display:inline"> +@0.1.0.0.0.0.4.3.0.0.0.3.2.47 " Name: " <span class="fn n"> @0.1.0.0.0.0.4.3.0.0.0.3.2.47.1 <a href="../F248/F248351.htm"> @0.1.0.0.0.0.4.3.0.0.0.3.2.47.1.0 "Margene Leonard" Born: Abt 1710 Died: Husband: <a href="../F248/F248351.htm"> @0.1.0.0.0.0.4.3.0.0.0.3.2.49 "Nathaniel Pratt" <br /> @0.1.0.0.0.0.4.3.0.0.0.3.2.51 -------------------- >>>>> NOTES --------------------

Replies are listed 'Best First'.
Re^3: Traversing an HTMLTree with HTML:Element ->right
by Anonymous Monk on Jun 30, 2009 at 13:24 UTC
    Re: The "vcard" divs are children of the "secTitle" divs rather then siblings.

    Oh how I wish that were true.
    Here is a a snippet of the Husband portion of the page.
    I think my problem is that I my iteration with ->right() assumes the ->content array of the parent node contains only references to HTML::Element objects. This assumption is wrong. The ->content array of an HTML::Element is a mix of text scalars and references. As your code (with the use of if ref() demonstrates). That this is not the case is demonstrated by this HTML snippet the test page
    <span class='BogusClasNameForThisExample'> Father: <a href="../F247/F247134.htm"> John Leonard </a> Mother: <a href="../F247/F247134.htm"> Sarah Leonard </a> </span>
    There are 4 elements in the ->content array of the span node:
    • Scalar text: Father
    • HTML::Element for <a href="../F247/F247134.htm">
    • Scalar text: Mother
    • HTML::Element for <a href="../F247/F247134.htm">
    I think I need to use the ->objectify_text() and \->deobjectify_text(). This should pack up the text as HTML::Elements so iterating with the ->right() function should then work.

    I will let you know the results when I get home tonight.