pduffy has asked for the wisdom of the Perl Monks concerning the following question:

Stupid I know (read on),

Want to propogate through a biomedical website retrieving information. Here is a portion of code:
use LWP::Simple; use LWP::UserAgent; use HTTP::Request::Common qw(POST); use URI::URL; $url = 'http://<website>/search/gene.asp?'; my $req = POST "$url", [ ID1 => 48852, ] $ua = new LWP::UserAgent; $result = $ua->request($req)->as_string; print $result;
(Ultimately ID1 will simply be a variable from another part of the program).

Now the problem is, is that it creates a dynamic ASP webpage http://<website>/search/genesrchresults.asp. BUT, you cannot access this directly.

From the script code above $result contains:
HTTP/1.1 302 (Found) Object moved Cache-Control: private Connection: Keep-Alive Date: Mon, 16 Jun 2003 10:40:33 GMT Location: genesrchresults.asp Server: Microsoft-IIS/5.0 Content-Length: 140 Content-Type: text/html Expires: Mon, 16 Jun 2003 09:00:33 GMT Client-Date: Mon, 16 Jun 2003 10:38:42 GMT Client-Peer: 63.108.93.17:80 Set-Cookie: ASPSESSIONIDCQTTATSB=JKFMHMDBGODCKACLGPAPBLMB; path=/ Title: Object moved <head><title>Object moved</title></head> <body><h1>Object Moved</h1>This object may be found <a HREF="genesrchr +esults.asp">here</a>.</body>
Well, we get a cookie that changes each time the script is run.

Can (and how can) this cookie be used to extract the required information from genesrchresults.asp?

Regards,

pDuffy

Replies are listed 'Best First'.
•Re: Accessing dynamic webpages
by merlyn (Sage) on Jun 16, 2003 at 11:46 UTC
    It might be simpler to get WWW::Mechanize::Shell to write a WWW::Mechanize program for you. It's pretty slick. Mechanize makes a "virtual browser" that hides most of the boring details of a browsing session behind the scenes. The Shell lets you drive it with a series of easy-to-type commands, and can optionally keep a browser synced with the current rendering. When you're done, you say "spit out a program", and it spits out a Mechanize program to repeat the same steps.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

Re: Accessing dynamic webpages
by gellyfish (Monsignor) on Jun 16, 2003 at 11:29 UTC

    You need to set redirect_ok() to a true value on the the LWP::UserAgent object and it should take of the redirection for you. You will also need to add a cookie jar so that the appropriate Cookie header is sent. i.e. add this before you call request():

    use HTML::Cookie; $ua->redirect_ok(1); $ua->cookie_jar(HTTP::Cookies->new(file => '/tmp/cookies.txt', autosa +ve => 1 ));

    /J\
    
Re: Accessing dynamic webpages
by zby (Vicar) on Jun 16, 2003 at 11:07 UTC
    I think just adding one more request would do what you need. I assume you are always redirected to the same address just with a different cookie. Untested:
    my $req2 = GET "$url"; $result = $ua->request($req)->as_string; print $result;

    Update:I forgot about the cookie jar - of course you need that too.

      zby gellyfish,

      Thanks for the insights. Not quite there yet. I think it boils down to the way I'm using the required cookie. Here's the code again:

      1:use LWP::Simple;
      2:use LWP::UserAgent;
      3:use HTTP::Request::Common qw(POST);
      4:use HTTP::Request::Common qw(GET);
      5:use HTTP::Cookies;
      6:$url = 'http://www.biocarta.com/search/gene.asp?';
      7:my $request = POST "$url",
      8:[
      9:ID1 => 48852, 10:]
      11:my $ua = LWP::UserAgent->new;
      12:my $response = $ua->request($request);
      13:print $response->as_string;

      14:'Okay. Let's set up a cookie jar, and extract the cookie 15:my $cjar = HTTP::Cookies->new(file => 'cookies.txt',
      autosave => 1);
      16:$cjar->extract_cookies($response);

      17:'As per your suggestions
      18:my $req2 = GET "$url";
      19:$cjar->add_cookie_header($req2);
      20:$ua->redirect_ok($req2);
      21:$response2 = $ua->request($req2);
      22:print $response2->as_string;

      As is it, it returns the genesrchresults.asp webpage, but without the search results (before, it returned undef). If I use $request on line 21 instead of $req2 it returns a 'Doh! page cannot be displayed error blah blah'

      Any more ideas?

      pDuffy.
        You need to add the cookie jar to the user agent:
        $ua->cookie_jar(file => 'cookies.txt', autosave => 1);
        Do it at the beginning of the code, just after the creation of the user agent, not after the first request. You don't need to extract the cookies or anything - just add the cookie jar that's all. And you don't need the  redirect_ok when you do the redirection manually.

        I would correct your code if you cared to add code tags.