in reply to HTML::TreeBuilder scan for first table

use HTML::TreeBuilder::XPath with htmltreexpather.pl / xpather.pl / examples(for tree-xpath and others)/walkthroughs/tutorials ...

Write something like

use HTML::TreeBuilder::XPath; my $tree = HTML::TreeBuilder::XPath->new; $tree->parse_content( $content ); my @headers = $tree->findnodes( q{ //table[@class='HeaderFrame' ] } ) ->shift ## get first one ->findvalues( q{ .//td[@class='HeaderTitle'] } ); print "@headers\n";

Or even all in one xpath expression

my @headers = $tree->findvalues( q{ ( //table[ @class = 'HeaderFrame' ] )[1] //td[ @class = 'HeaderTitle' ] } );

Replies are listed 'Best First'.
Re^2: HTML::TreeBuilder scan for first table ( HTML::TreeBuilder::XPath )
by mazdajai (Novice) on Jan 22, 2016 at 15:46 UTC
    Thanks for the suggest everyone. I will try the suggestions. Is that an easier way to inspect the tree elements in TreeBuilder or TableExtract? For example, there are online parser that you can test regex, I am curious if there is anything similar that can help debug when the element is not being retrieve as expected?

      Is that an easier way to inspect the tree elements in TreeBuilder or TableExtract?

      Which name is mentioned in the code?

      For example, there are online parser that you can test regex, I am curious if there is anything similar that can help debug when the element is not being retrieve as expected?

      The *xpather*s help you craft xpaths you can use to retrieve the stuff you want

      When the html changes significantly, you run the *xpather*s to craft new xpaths

Re^2: HTML::TreeBuilder scan for first table ( HTML::TreeBuilder::XPath )
by mr_ron (Deacon) on Jan 25, 2016 at 21:25 UTC

    Sorry - Guess I got lost in the formatting of the post I replied to. I may not have seen XPath formatted this way before.

    > Or even all in one xpath expression > my @headers = $tree->findvalues( q{ > ( > //table[ @class = 'HeaderFrame' ] > )[1] > //td[ @class = 'HeaderTitle' ] > } );

    It's mostly my fault but maybe below is easier to follow possibly more familiar and compact?

    my @headers = $tree->findvalues( '(//table[@class="HeaderFrame"])[1]//td[@class="HeaderTitle"]' );
    Ron