Hello
I'm using HTML::TreeBuilder::XPath to extract data from an html page, i can't understand very well how it work, basically i want to get the value inside of "<div class="here">" but file by file, i've made an example based in the documentation but doesn't work, check below:
use HTML::TreeBuilder::XPath; my $tree = HTML::TreeBuilder::XPath->new; $tree->parse_content(do { local($/); <DATA>}); for my $result ($tree->findnodes(q{/html/body/div})) { print $result->findvalue(q{//div[@class="here"]}); print "<br>".("-" x 120)."<br>"; } __DATA__; <html> <body> <div> <div class="here">this's the value</div> </div> <div> <div class="here">this's the value</div> </div> <div> <div class="here">this's the value</div> </div> <div> <div class="here">this's the value</div> </div> <div> <div class="here">this's the value</div> </div> <div> <div class="here">this's the value</div> </div> </body> </html>
It print this:
this's the valuethis's the valuethis's the valuethis's the valuethis's + the valuethis's the value ---------------------------------------------------------------------- +-------------------------------------------------- this's the valuethis's the valuethis's the valuethis's the valuethis's + the valuethis's the value ---------------------------------------------------------------------- +-------------------------------------------------- this's the valuethis's the valuethis's the valuethis's the valuethis's + the valuethis's the value ---------------------------------------------------------------------- +-------------------------------------------------- this's the valuethis's the valuethis's the valuethis's the valuethis's + the valuethis's the value ---------------------------------------------------------------------- +-------------------------------------------------- this's the valuethis's the valuethis's the valuethis's the valuethis's + the valuethis's the value ---------------------------------------------------------------------- +-------------------------------------------------- this's the valuethis's the valuethis's the valuethis's the valuethis's + the valuethis's the value ---------------------------------------------------------------------- +--------------------------------------------------
So, the solution for me was made this:
use HTML::TreeBuilder::XPath; my $tree = HTML::TreeBuilder::XPath->new; $tree->parse_content(do { local($/); <DATA>}); for my $result ($tree->findnodes(q{/html/body/div})) { my $x = HTML::TreeBuilder::XPath->new; $x->parse($result->as_HTML); print $x->findvalue(q{//div[@class="here"]}); print "<br>".("-" x 17)."<br>"; } __DATA__; <html> <body> <div> <div class="here">this's the value</div> </div> <div> <div class="here">this's the value</div> </div> <div> <div class="here">this's the value</div> </div> <div> <div class="here">this's the value</div> </div> <div> <div class="here">this's the value</div> </div> <div> <div class="here">this's the value</div> </div> </body> </html>
It print this:
this's the value ----------------- this's the value ----------------- this's the value ----------------- this's the value ----------------- this's the value ----------------- this's the value -----------------
But i think, isn't pretty code, what's the correct way to do this, what's wrong in the first example?
Thank you in advance
In reply to HTML and Xpath by way
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |