billycote has asked for the wisdom of the Perl Monks concerning the following question:

I want to be able to go to the following web site periodically and pull back the data. http://emma.msrb.org/MarketActivity/RecentTrades.aspx This seems simple enough yet I'm having an issue in that a get() only gets the page where it tells me i have to agree to terms and conditions. I know it then gives me a cookie and I'm good to go. I want to be able to get that cookie programatically and be able to go to the site. I think I should be able to do this. I've tried both the CGI methods and the HTTP:COOKIES methods but I can't get much more than the first page no matter what I do. I definitely feel it has to do with the cookie. I'd post my code but it's really ugly at this point with all the comments... is there something I'm fundamentally missing here? here's the code anyway... like I said, it's ugly and missing a lot of stuff
#!/apps/VA/perl5.8.0/bin/perl use Net::SMTP; use Sys::Hostname; use HTTP::Cookies; use HTTP::Request::Common; use Time::Local; use POSIX (strftime); use LWP::UserAgent; use LWP::Simple; use Getopt::Std; use LWP::Simple qw( $ua get ); $ua->proxy( 'http', 'http://http.proxy.fmr.com:8000' ); # Load LWP class for "cookie jar" objects #$ua->cookie_jar( { file=> "emma_cookie.txt" }); #my $browser = LWP::UserAgent->new( ); my $cookie_jar = HTTP::Cookies->new( 'file' => 'emma_cookie.txt' ); #$browser->cookie_jar( $cookie_jar ); #$cookies->set_cookie(0,'cookiename', 'value','/','emma.msrb.org',80,0 +,0,86400,0); #$ua->cookie_jar($cookies); # Now make your request my $request = 'http://emma.msrb.org/MarketActivity/RecentTrades.aspx'; print "$request\n"; $cookie_jar->add_cookie_header( $request ); #getstore('http://emma.msrb.org/MarketActivity/RecentTrades.aspx', "MS +RB.txt"); #getstore('http://www.nasdaqtrader.com/dynamic/SymDir/otherlisted.txt' +, "otherlisted.txt"); #getstore('http://www.nasdaqtrader.com/dynamic/SymDir/nasdaqlisted.txt +', "nasdaqlisted.txt"); #my $ua = LWP::UserAgent->new; #$ua->cookie_jar($cookie_jar); #my $request = GET "http://www.emma.msrb.org"; # don't worry, it doesn +'t get saved :-) #my $response = $ua->request( $request ); #$cookie_jar->extract_cookies( $response );

Replies are listed 'Best First'.
Re: Getting data from an ASPX generated web page
by Loops (Curate) on Jul 18, 2013 at 22:43 UTC

    Here is some code that goes toward your stated goal. I leave it to you to read and abide by the site TOS

    use strict; use warnings; use WWW::Mechanize; use HTTP::Cookies; my $url = 'http://target-site.com'; my $cookies = HTTP::Cookies->new( file => "cookies.dat" ); my $mech = WWW::Mechanize->new( cookie_jar => $cookies ); $mech->get($url); my ($button) = $mech->find_all_inputs( type => 'image', name_regex => qr/yesButton$/, ); if (defined $button) { print "clicking button...\n"; $mech->click($button->{name}); $cookies->save; } my $response = $mech->content();

    The first time it runs, you'll see 'clicking button...'. On subsequent invocations, this step should be avoided because the cookie will be reused

      Thanks so much! I feel like it's almost there. Just having a bit of trouble saving the cookie. I see it but for whatever reason it's not getting saved as I think it should be Here's my code:
      use strict; use warnings; use WWW::Mechanize; use HTTP::Cookies; my $url = 'http://emma.msrb.org/MarketActivity/RecentTrades.aspx'; my $cookies = HTTP::Cookies->new( file => "cookies.dat",autosave => 1, + ignore_discard => 1 ); my $mech = WWW::Mechanize->new( cookie_jar => $cookies ); $mech->proxy(['http'], 'http://http.proxy.fmr.com:8000'); my $response = $mech->get($url); $cookies->save( "cookies.dat"); if ($mech->success()){ print "Successful Connection\n"; } else { print "Not a successful connection\n"; } my ($button) = $mech->find_all_inputs( type => 'image', name_regex => qr/yesButton$/, ); print("button = $button->{name}\n"); if (defined $button) { print "clicking button...\n"; $mech->click($button->{name}); $cookies->save( "cookies.dat"); $mech->dump_text; }
      when I run it everything looks good except that this is wat I'm getting from the dump_text method.

      Object movedHTTP/1.1 302 Found Date: Fri, 19 Jul 2013 20:03:11 GMT Server: Microsoft-IIS/6.0 X-Powered-By: ASP.NET X-AspNet-Version: 4.0.30319 Location: http://emma.msrb.org/MarketActivity/RecentTrades.aspx Cache-Control: private Content-Type: text/html; charset=utf-8 Content-Length: 170 Connection: close Set-Cookie: Disclaimer=Ratings; expires=Thu, 19-Jul-2063 20:03:11 GMT; path=/ Object moved to here.

      and nothing is getting into the cookies.dat file besides #LWP-Cookies-1.0 Frustating. I thought maybe I should call the get method again but that does nothing different. I'm sure I'm just missing something and again thanks so much for your help.