Blue_eyed_son has asked for the wisdom of the Perl Monks concerning the following question:

Hi- Has anyone figured out how to spider around on Lexis Nexis? I'd like to collect data on newspaper reporting, but the guided news search page interacts with javascript, so I am unsure how to go about posting my searches. I've tried using WWW::MECHANIZE, but it doesn't like the javascript as far as I have been able to figure out. The webpage is:

sample

Any help would be *SWEET*.

Replies are listed 'Best First'.
Re: Lexis-Nexis Academic spider
by mpeters (Chaplain) on Apr 26, 2005 at 16:27 UTC
    Regardless of whether or not the page uses Javascript it still has to communicate with the server via HTTP. All you need to do is find out if the Javascript is affecting what is sent to the server or not. You might also have to step outside of WWW::Mechanize and use HTTP::Request your self. It's really not that bad, and you can even use the two together.
    #build out your request my $request = HTTP::Request::Common::POST->new(....); $mech->request($request);
    And then you can continue using Mech.

    If you are having a hard time figuring out what HTTP the browser is sending to the server, check out something like the LiveHTTPHeaders plugin for Mozilla, or ethereal.
Re: Lexis-Nexis Academic spider
by ikegami (Patriarch) on Apr 26, 2005 at 16:27 UTC
Re: Lexis-Nexis Academic spider
by BigRare (Pilgrim) on Apr 26, 2005 at 19:58 UTC
    The easiest way to get this to work is to use the following modules: HTTP::Proxy, and HTTP::Recorder

    Check the CPAN documentation on them.

    HTTP::Proxy sets up a proxy server, which you can set your browser to use.
    HTTP::Recorder sits in the proxy and records your actions and generates a script for WWW::Mechanize to use.

    Example code using these modules:
    use HTTP::Proxy; use HTTP::Recorder; my $file = /path/to/file/to/log; my $proxy = HTTP::Proxy->new(port => 3128); # create a new HTTP::Recorder object my $agent = new HTTP::Recorder; # set the log file (optional) $agent->file($file); # set HTTP::Recorder as the agent for the proxy $proxy->agent( $agent ); # start the proxy $proxy->start();