I've been fighting with this one all day. I've gone through quite a few threads here, but none of them seem to be helping. I'm trying to figure out a way to get text information that is rendered on the screen, but is absent in the source code.
I've come across numerous posts that say you should use this module, or that module, but quite frankly the documentation on some of these modules is too lackluster for a novice to follow.
I'm currently working with WWW::Mechanize::Firefox (which was suggested quite a few times), but it only seems to be able to return the basic source code of the page and not what is actually rendering on the screen.
I've also tried using WWW::Scripter with the Javascript plugin without success.
Basically you can find all the Perl Monks threads by checking the last post in this thread: http://www.perlmonks.org/?node_id=821773
I also attempted to use some ATT proxy thing that is supposed to let you see all the data passed, but it did nothing at all that I could see.
At one point I attempted to install the Firefox screen render plugin, but it appears that this is no longer available. However, I did find the View Source Chart add-on and it does include the rendered text in the source chart. I have no way of getting the data from that source chart over to perl though.
Does anyone have a way to do this other than what has already been suggested? At the very least, if someone could point me to some worthwhile documentation? I've read through everything on the CPAN site regarding WWW::Mechanize::Firefox (FAQ, Troubleshooting, Examples, etc) but nothing seems to indicate how to actually pull this information. The examples for Javascript don't seem to work at all and just throw compilation errors.
I've only been using Perl for a few weeks, but I love the language. I just need a push in the right direction for doing this.
Here is the page that I'm using as an example: http://www.acehardware.com/mystore/storeDetail.jsp?store=14671
I'm trying to see if I can get the address information. I know the span/div IDs for the items I want, it just won't come through for me.
Here is where I'm currently at with WWW::Mechanize::Firefox.
#!usr/bin/perl use strict; use warnings; use WWW::Mechanize::Firefox; my $mech = WWW::Mechanize::Firefox->new(autoclose => 0); $mech->allow(javascript => 1); #$mech->get('http://www.acehardware.com/mystore/storeDetail.jsp?store= +14671', ':content_file' => 'webpage.txt'); $mech->get('http://www.acehardware.com/mystore/storeDetail.jsp?store=1 +4671'); my $retries = 10; while ($retries-- and ! $mech->is_visible( xpath => '//*[@id="city +"]' )) { sleep 1; }; die "Timeout" unless $retries; # Now the element exists #$mech->click({xpath => '//*[@id="submit"]'}); print "..." . $mech->xpath('//dataAddress2[id="city"]', one => 1); open(FO,">test.txt") or die "unable"; print FO $mech->content; print "DONE!";
This gives me the following error:
No elements found for '//dataAddress2[id="city"]' at script2.pl line 2.Thanks in advance from an aspiring perlophyte.
In reply to Scraping Rendered Text that is not in Source Code by bobross419
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |