Hello

I'm using HTML::TreeBuilder::XPath to extract data from an html page, i can't understand very well how it work, basically i want to get the value inside of "<div class="here">" but file by file, i've made an example based in the documentation but doesn't work, check below:

use HTML::TreeBuilder::XPath; my $tree = HTML::TreeBuilder::XPath->new; $tree->parse_content(do { local($/); <DATA>}); for my $result ($tree->findnodes(q{/html/body/div})) { print $result->findvalue(q{//div[@class="here"]}); print "<br>".("-" x 120)."<br>"; } __DATA__; <html> <body> <div> <div class="here">this's the value</div> </div> <div> <div class="here">this's the value</div> </div> <div> <div class="here">this's the value</div> </div> <div> <div class="here">this's the value</div> </div> <div> <div class="here">this's the value</div> </div> <div> <div class="here">this's the value</div> </div> </body> </html>

It print this:

this's the valuethis's the valuethis's the valuethis's the valuethis's + the valuethis's the value ---------------------------------------------------------------------- +-------------------------------------------------- this's the valuethis's the valuethis's the valuethis's the valuethis's + the valuethis's the value ---------------------------------------------------------------------- +-------------------------------------------------- this's the valuethis's the valuethis's the valuethis's the valuethis's + the valuethis's the value ---------------------------------------------------------------------- +-------------------------------------------------- this's the valuethis's the valuethis's the valuethis's the valuethis's + the valuethis's the value ---------------------------------------------------------------------- +-------------------------------------------------- this's the valuethis's the valuethis's the valuethis's the valuethis's + the valuethis's the value ---------------------------------------------------------------------- +-------------------------------------------------- this's the valuethis's the valuethis's the valuethis's the valuethis's + the valuethis's the value ---------------------------------------------------------------------- +--------------------------------------------------

So, the solution for me was made this:

use HTML::TreeBuilder::XPath; my $tree = HTML::TreeBuilder::XPath->new; $tree->parse_content(do { local($/); <DATA>}); for my $result ($tree->findnodes(q{/html/body/div})) { my $x = HTML::TreeBuilder::XPath->new; $x->parse($result->as_HTML); print $x->findvalue(q{//div[@class="here"]}); print "<br>".("-" x 17)."<br>"; } __DATA__; <html> <body> <div> <div class="here">this's the value</div> </div> <div> <div class="here">this's the value</div> </div> <div> <div class="here">this's the value</div> </div> <div> <div class="here">this's the value</div> </div> <div> <div class="here">this's the value</div> </div> <div> <div class="here">this's the value</div> </div> </body> </html>

It print this:

this's the value ----------------- this's the value ----------------- this's the value ----------------- this's the value ----------------- this's the value ----------------- this's the value -----------------

But i think, isn't pretty code, what's the correct way to do this, what's wrong in the first example?

Thank you in advance


In reply to HTML and Xpath by way

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.