geektron has asked for the wisdom of the Perl Monks concerning the following question:
i've been tasked with screen-scraping what originally looked like an easy page, but it turns out the entire page is built with calls to Javascript's document.write. i suspect the engineers were trying to avoid screen scraping in the first place ...
I can get the information out of the page I need from a little bit of reverse-engineering and parsing an array of arrays (in Javascript). i've read through Javascript::Spidermonkey to see if this will DWIM ... but i can't tell from reading the perldoc if i can use Javascript::Spidermonkey to extract arrays from the page code, or if i'm going to have to resort to some brute-force parsing of the page.
is Javascript::Spidermonkey what i'm looking for in this case? or should i stick with some combination of something like WWW::Mechanize, LWP, etc ...
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: more screen scraping with embedded Javascript
by Joost (Canon) on Oct 25, 2004 at 20:52 UTC | |
by geektron (Curate) on Oct 25, 2004 at 22:21 UTC | |
|
Re: more screen scraping with embedded Javascript
by johnnywang (Priest) on Oct 26, 2004 at 02:17 UTC | |
by geektron (Curate) on Oct 26, 2004 at 02:49 UTC |