Extracting data from a website.

nkagolanu has asked for the wisdom of the Perl Monks concerning the following question:

Hi I am trying to extract data from a website and I am using LWP:useragent for it.

use LWP::UserAgent;
  my $ua = LWP::UserAgent->new;
  $ua->agent("MyApp/0.1 ");
  $ua->proxy('http', 'my proxy');

  # Create a request
  my $req = HTTP::Request->new(GET => 'http://dtcc.com/products/derivs
+erv/data_table_iv.php');
  $req->content_type('application/x-www-form-urlencoded');
  $req->content('query=libwww-perl&mode=dist');

  # Pass request to the user agent and get a response back
  my $res = $ua->request($req);

  # Check the outcome of the response
  if ($res->is_success) {
      print $res->content;
  }
  else {
      print $res->status_line, "\n";
  }
[download]

The way the dtcc site is that there is an agree/decline form and it displays the data upon clicking the agree button. I want to extract the data and parse it but my problem is that the script is extracting the agree/decline page but not the data page. Is there a way for me to automate clicking on agree button so that I can extract the data on the second page.

Comment on Extracting data from a website. Download Code

Replies are listed 'Best First'.
Re: Extracting data from a website. by moritz (Cardinal) on Jan 06, 2011 at 21:16 UTC
Is there a way for me to automate clicking on agree button so that I can extract the data on the second page. That's exactly what WWW::Mechanize was written for. Perl 6 - second systems done right	[reply]
Re: Extracting data from a website. by oko1 (Deacon) on Jan 07, 2011 at 00:14 UTC
Let WWW::Mechanize::Shell build you a WWW::Mechanize script. I just learned about it a few days ago, and it's a wonderful gadget. Goes something like this: `perl -MWWW::Mechanize::Shell -weshell > get http://your_site > form # Displays the form variables > click Agree # Select the right button to click, or +just 'submit' if there's only 1 form > content # Show the page content > script /tmp/foobar # Save a WWW::Mechanize-based script in + /tmp > q # Quit` [download] I used it to script deleting comments in WordPress, and was very impressed. Took about 30 seconds to "write" that script. -- Education is not the filling of a pail, but the lighting of a fire. -- W. B. Yeats	[reply] [d/l]