Smeet2002 has asked for the wisdom of the Perl Monks concerning the following question:

I don't understand how to access an HTML table from a document.

I am playing with the job link(I am not able to post it here, but you can get it from here: Question on StackOverflow

)

The idea is to click on "Next page" button several times and gather all small HTML tables into one.

When I open the link with WWW::Mechanize::Firefox, I can get the whole document (and first page HTML table) with  my $cont= $mech->content( format => 'html' );

After that I click on "Next page" button with

my $id="search_result_next_page_link"; $mech->click({ xpath => qq{//*[\@id="$id"]}, synchronize => 0 });

I can click the button many times and the table is being changed inside the document, but I can not use $mech->content any more, because the URL is the same and content is not changing.

I tried something like @tt= $mech->xpathEx(xpath=>'/html/body/form/div[4]/div/main/div/div[3]/section/div/div/table/'); ,it gives me something like  MozRepl::RemoteObject , but doesn't give me any idea how to get actual HTML table code.

Replies are listed 'Best First'.
Re: How to get an HTML table from the document using WWW::Mechanize::Firefox ?
by Corion (Patriarch) on Oct 28, 2014 at 19:39 UTC

    As I already told you two times via email, please try the "innerHTML" property.

    my @tt= $mech->xpathEx(...); print $tt[0]->{innerHTML};

      I tried it right after our email exchange, it didn't work for me

      Here is a piece of my code:

      .... my $id="search_result_next_page_link"; $mech->click({ xpath => qq{//*[\@id="$id"]}, synchronize => 0 }); sleep 2; my @tt= $mech->xpathEx(xpath=>'/html/body/form/div[4]/div/main/div/div +[3]/section/div/div/table/'); print $tt[0]->{innerHTML}; print "\n---\n";

      ....

      And here is what I get:

      >perl search_scotia.pl Use of uninitialized value in print at search_scotia.pl line 37. ---

      Property {'innerHTML'} returns nothing...

        Then your XPath query did not return anything.

        Please post a short, self-contained program that reproduces the problem. That will help us reproduce the problem and maybe find a solution.

        Looking at the HTML of that link you posted, why don't you use the class of the target element?

        $mech->selector('.tableSearchResults')?</p> <p>Also note that the first argument to <c>->xpathEx
        is the XPath query, not xpath.

        After fixing that part, Firefox complains that your XPath query is invalid syntax. After fixing that, your XPath query seems to go astray somewhere because no elements are found. Maybe you want to try the following small example?

        use strict; use warnings; use WWW::Mechanize::Firefox; my $mech= WWW::Mechanize::Firefox->new(); $mech->get('http://jobs.scotiabank.com/search/advanced-search/ASCatego +ry/IT/ASPostedDate/-1/ASCountry/Canada/ASState/Ontario/ASCity/Toronto +/ASLocation/-1/ASCompanyName/-1/ASCustom1/-1/ASCustom2/-1/ASCustom3/- +1/ASCustom4/-1/ASCustom5/-1/ASIsRadius/false/ASCityStateZipcode/-1/AS +Distance/-1/ASLatitude/-1/ASLongitude/-1/ASDistanceType/-1'); my $id="search_result_next_page_link"; $mech->click({ xpath => qq{//*[\@id="$id"]}, synchronize => 0 }); sleep 2; my @tt= $mech->xpathEx(xpath=>'/html/body/form/div[4]/div/main/div/div +[3]/section/div/div/table/'); print 0+@tt; print $tt[0]->{innerHTML}; print "\n<--- bad API usage\n"; @tt= $mech->xpathEx('/html/body/form/div[4]/div/main/div/div[3]/sectio +n/div/div/table'); print 0+@tt; print $tt[0]->{innerHTML}; print "\n<--- fixed XPath\n"; @tt= $mech->selector('.tableSearchResults'); print 0+@tt; print $tt[0]->{innerHTML}; print "\n<--- CSS\n";