way has asked for the wisdom of the Perl Monks concerning the following question:
Hello
I'm using HTML::TreeBuilder::XPath to extract data from an html page, i can't understand very well how it work, basically i want to get the value inside of "<div class="here">" but file by file, i've made an example based in the documentation but doesn't work, check below:
use HTML::TreeBuilder::XPath; my $tree = HTML::TreeBuilder::XPath->new; $tree->parse_content(do { local($/); <DATA>}); for my $result ($tree->findnodes(q{/html/body/div})) { print $result->findvalue(q{//div[@class="here"]}); print "<br>".("-" x 120)."<br>"; } __DATA__; <html> <body> <div> <div class="here">this's the value</div> </div> <div> <div class="here">this's the value</div> </div> <div> <div class="here">this's the value</div> </div> <div> <div class="here">this's the value</div> </div> <div> <div class="here">this's the value</div> </div> <div> <div class="here">this's the value</div> </div> </body> </html>
It print this:
this's the valuethis's the valuethis's the valuethis's the valuethis's + the valuethis's the value ---------------------------------------------------------------------- +-------------------------------------------------- this's the valuethis's the valuethis's the valuethis's the valuethis's + the valuethis's the value ---------------------------------------------------------------------- +-------------------------------------------------- this's the valuethis's the valuethis's the valuethis's the valuethis's + the valuethis's the value ---------------------------------------------------------------------- +-------------------------------------------------- this's the valuethis's the valuethis's the valuethis's the valuethis's + the valuethis's the value ---------------------------------------------------------------------- +-------------------------------------------------- this's the valuethis's the valuethis's the valuethis's the valuethis's + the valuethis's the value ---------------------------------------------------------------------- +-------------------------------------------------- this's the valuethis's the valuethis's the valuethis's the valuethis's + the valuethis's the value ---------------------------------------------------------------------- +--------------------------------------------------
So, the solution for me was made this:
use HTML::TreeBuilder::XPath; my $tree = HTML::TreeBuilder::XPath->new; $tree->parse_content(do { local($/); <DATA>}); for my $result ($tree->findnodes(q{/html/body/div})) { my $x = HTML::TreeBuilder::XPath->new; $x->parse($result->as_HTML); print $x->findvalue(q{//div[@class="here"]}); print "<br>".("-" x 17)."<br>"; } __DATA__; <html> <body> <div> <div class="here">this's the value</div> </div> <div> <div class="here">this's the value</div> </div> <div> <div class="here">this's the value</div> </div> <div> <div class="here">this's the value</div> </div> <div> <div class="here">this's the value</div> </div> <div> <div class="here">this's the value</div> </div> </body> </html>
It print this:
this's the value ----------------- this's the value ----------------- this's the value ----------------- this's the value ----------------- this's the value ----------------- this's the value -----------------
But i think, isn't pretty code, what's the correct way to do this, what's wrong in the first example?
Thank you in advance
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: HTML and Xpath
by mirod (Canon) on Nov 06, 2008 at 15:04 UTC | |
by way (Sexton) on Nov 06, 2008 at 16:05 UTC | |
by mirod (Canon) on Nov 06, 2008 at 16:43 UTC | |
|
Re: HTML and Xpath
by Anonymous Monk on Nov 06, 2008 at 15:02 UTC | |
by Anonymous Monk on Nov 06, 2008 at 15:09 UTC | |
by way (Sexton) on Nov 06, 2008 at 15:56 UTC | |
by Anonymous Monk on Aug 17, 2009 at 11:10 UTC | |
by way (Sexton) on Nov 06, 2008 at 15:53 UTC | |
|
Re: HTML and Xpath
by ikegami (Patriarch) on Nov 06, 2008 at 20:38 UTC | |
by way (Sexton) on Nov 15, 2008 at 20:12 UTC |