in reply to Scraping Rendered Text that is not in Source Code
Your script finds the element you seem to be looking for once I fix the bad Xpath query in line 21:
//dataAddress2[id="city"]
would be searching for an HTML tag dataAddress2, which does not exist on that page (nor anywhere else).
As you are searching for an element with an id attribute anyway, and id attributes are (supposed to be) unique across the page, using the following XPath expression extracts the element for me (provided I've unblocked the crappy Javascript on all those pages in Noscript):
//*[@id="city"]
For finding what elements I've captured, I like to print ->{innerHTML}:
print "..." . $mech->xpath('//*[@id="city"]', one => 1)->{innerHTML};
It seems that the Javascript gets triggered after some time without another event and the element just gets filled in instead of actually appearing, so you might need to wait in a loop to watch the element content change from to the content you actually want.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Scraping Rendered Text that is not in Source Code
by bobross419 (Acolyte) on Oct 31, 2010 at 19:58 UTC |