nkagolanu has asked for the wisdom of the Perl Monks concerning the following question:

Hi I am trying to extract data from a website and I am using LWP:useragent for it.
use LWP::UserAgent; my $ua = LWP::UserAgent->new; $ua->agent("MyApp/0.1 "); $ua->proxy('http', 'my proxy'); # Create a request my $req = HTTP::Request->new(GET => 'http://dtcc.com/products/derivs +erv/data_table_iv.php'); $req->content_type('application/x-www-form-urlencoded'); $req->content('query=libwww-perl&mode=dist'); # Pass request to the user agent and get a response back my $res = $ua->request($req); # Check the outcome of the response if ($res->is_success) { print $res->content; } else { print $res->status_line, "\n"; }
The way the dtcc site is that there is an agree/decline form and it displays the data upon clicking the agree button. I want to extract the data and parse it but my problem is that the script is extracting the agree/decline page but not the data page. Is there a way for me to automate clicking on agree button so that I can extract the data on the second page.

Replies are listed 'Best First'.
Re: Extracting data from a website.
by moritz (Cardinal) on Jan 06, 2011 at 21:16 UTC
Re: Extracting data from a website.
by oko1 (Deacon) on Jan 07, 2011 at 00:14 UTC

    Let WWW::Mechanize::Shell build you a WWW::Mechanize script. I just learned about it a few days ago, and it's a wonderful gadget. Goes something like this:

    perl -MWWW::Mechanize::Shell -weshell > get http://your_site > form # Displays the form variables > click Agree # Select the right button to click, or +just 'submit' if there's only 1 form > content # Show the page content > script /tmp/foobar # Save a WWW::Mechanize-based script in + /tmp > q # Quit

    I used it to script deleting comments in WordPress, and was very impressed. Took about 30 seconds to "write" that script.

    -- 
    Education is not the filling of a pail, but the lighting of a fire.
     -- W. B. Yeats