How to get an HTML table from the document using WWW::Mechanize::Firefox ?

Smeet2002 has asked for the wisdom of the Perl Monks concerning the following question:

I don't understand how to access an HTML table from a document.

I am playing with the job link(I am not able to post it here, but you can get it from here: Question on StackOverflow

)

The idea is to click on "Next page" button several times and gather all small HTML tables into one.

When I open the link with WWW::Mechanize::Firefox, I can get the whole document (and first page HTML table) with my $cont= $mech->content( format => 'html' );

After that I click on "Next page" button with


my $id="search_result_next_page_link"; 
$mech->click({ xpath => qq{//*[\@id="$id"]}, synchronize => 0 });
[download]

I can click the button many times and the table is being changed inside the document, but I can not use $mech->content any more, because the URL is the same and content is not changing.

I tried something like @tt= $mech->xpathEx(xpath=>'/html/body/form/div[4]/div/main/div/div[3]/section/div/div/table/'); ,it gives me something like MozRepl::RemoteObject , but doesn't give me any idea how to get actual HTML table code.

Comment on How to get an HTML table from the document using WWW::Mechanize::Firefox ? Select or Download Code

Replies are listed 'Best First'.
Re: How to get an HTML table from the document using WWW::Mechanize::Firefox ? by Corion (Patriarch) on Oct 28, 2014 at 19:39 UTC
As I already told you two times via email, please try the "innerHTML" property. `my @tt= $mech->xpathEx(...); print $tt[0]->{innerHTML};` [download]	[reply] [d/l]
Re^2: How to get an HTML table from the document using WWW::Mechanize::Firefox ? by Smeet2002 (Initiate) on Oct 28, 2014 at 19:59 UTC
I tried it right after our email exchange, it didn't work for me Here is a piece of my code: `.... my $id="search_result_next_page_link"; $mech->click({ xpath => qq{//*[\@id="$id"]}, synchronize => 0 }); sleep 2; my @tt= $mech->xpathEx(xpath=>'/html/body/form/div[4]/div/main/div/div +[3]/section/div/div/table/'); print $tt[0]->{innerHTML}; print "\n---\n";` [download] .... And here is what I get: `>perl search_scotia.pl Use of uninitialized value in print at search_scotia.pl line 37. ---` [download] Property `{'innerHTML'}` returns nothing...	[reply] [d/l] [select]
Re^3: How to get an HTML table from the document using WWW::Mechanize::Firefox ? by Corion (Patriarch) on Oct 28, 2014 at 20:54 UTC
Then your XPath query did not return anything. Please post a short, self-contained program that reproduces the problem. That will help us reproduce the problem and maybe find a solution. Looking at the HTML of that link you posted, why don't you use the `class` of the target element? `$mech->selector('.tableSearchResults')?</p> <p>Also note that the first argument to <c>->xpathEx` [download] is the XPath query, not `xpath`. After fixing that part, Firefox complains that your XPath query is invalid syntax. After fixing that, your XPath query seems to go astray somewhere because no elements are found. Maybe you want to try the following small example? use strict; use warnings; use WWW::Mechanize::Firefox; my $mech= WWW::Mechanize::Firefox->new(); $mech->get('http://jobs.scotiabank.com/search/advanced-search/ASCatego +ry/IT/ASPostedDate/-1/ASCountry/Canada/ASState/Ontario/ASCity/Toronto +/ASLocation/-1/ASCompanyName/-1/ASCustom1/-1/ASCustom2/-1/ASCustom3/- +1/ASCustom4/-1/ASCustom5/-1/ASIsRadius/false/ASCityStateZipcode/-1/AS +Distance/-1/ASLatitude/-1/ASLongitude/-1/ASDistanceType/-1'); my $id="search_result_next_page_link"; $mech->click({ xpath => qq{//*[\@id="$id"]}, synchronize => 0 }); sleep 2; my @tt= $mech->xpathEx(xpath=>'/html/body/form/div[4]/div/main/div/div +[3]/section/div/div/table/'); print 0+@tt; print $tt[0]->{innerHTML}; print "\n<--- bad API usage\n"; @tt= $mech->xpathEx('/html/body/form/div[4]/div/main/div/div[3]/sectio +n/div/div/table'); print 0+@tt; print $tt[0]->{innerHTML}; print "\n<--- fixed XPath\n"; @tt= $mech->selector('.tableSearchResults'); print 0+@tt; print $tt[0]->{innerHTML}; print "\n<--- CSS\n"; [download]	[reply] [d/l] [select]
Re^4: How to get an HTML table from the document using WWW::Mechanize::Firefox ? by Smeet2002 (Initiate) on Oct 29, 2014 at 02:25 UTC
Re^5: How to get an HTML table from the document using WWW::Mechanize::Firefox ? by Corion (Patriarch) on Oct 29, 2014 at 08:00 UTC