Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: HTML::TreeBuilder scan for first table ( HTML::TreeBuilder::XPath )

by Anonymous Monk
on Jan 22, 2016 at 00:32 UTC ( [id://1153336]=note: print w/replies, xml ) Need Help??


in reply to HTML::TreeBuilder scan for first table

use HTML::TreeBuilder::XPath with htmltreexpather.pl / xpather.pl / examples(for tree-xpath and others)/walkthroughs/tutorials ...

Write something like

use HTML::TreeBuilder::XPath; my $tree = HTML::TreeBuilder::XPath->new; $tree->parse_content( $content ); my @headers = $tree->findnodes( q{ //table[@class='HeaderFrame' ] } ) ->shift ## get first one ->findvalues( q{ .//td[@class='HeaderTitle'] } ); print "@headers\n";

Or even all in one xpath expression

my @headers = $tree->findvalues( q{ ( //table[ @class = 'HeaderFrame' ] )[1] //td[ @class = 'HeaderTitle' ] } );

Replies are listed 'Best First'.
Re^2: HTML::TreeBuilder scan for first table ( HTML::TreeBuilder::XPath )
by mazdajai (Novice) on Jan 22, 2016 at 15:46 UTC
    Thanks for the suggest everyone. I will try the suggestions. Is that an easier way to inspect the tree elements in TreeBuilder or TableExtract? For example, there are online parser that you can test regex, I am curious if there is anything similar that can help debug when the element is not being retrieve as expected?

      Is that an easier way to inspect the tree elements in TreeBuilder or TableExtract?

      Which name is mentioned in the code?

      For example, there are online parser that you can test regex, I am curious if there is anything similar that can help debug when the element is not being retrieve as expected?

      The *xpather*s help you craft xpaths you can use to retrieve the stuff you want

      When the html changes significantly, you run the *xpather*s to craft new xpaths

Re^2: HTML::TreeBuilder scan for first table ( HTML::TreeBuilder::XPath )
by mr_ron (Chaplain) on Jan 25, 2016 at 21:25 UTC

    Sorry - Guess I got lost in the formatting of the post I replied to. I may not have seen XPath formatted this way before.

    > Or even all in one xpath expression > my @headers = $tree->findvalues( q{ > ( > //table[ @class = 'HeaderFrame' ] > )[1] > //td[ @class = 'HeaderTitle' ] > } );

    It's mostly my fault but maybe below is easier to follow possibly more familiar and compact?

    my @headers = $tree->findvalues( '(//table[@class="HeaderFrame"])[1]//td[@class="HeaderTitle"]' );
    Ron

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1153336]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (4)
As of 2024-03-29 13:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found