seaver has asked for the wisdom of the Perl Monks concerning the following question:

Dear all,

My problem is that I'm using LWP::UserAgent to query a library of scientific citations

I've figured out most things, except when I submit a search, and the results page comes up, the INITIAL results page is actually incomplete in it's searching, and then updates itself several times within a few seconds to show the complete number of citations it has found

What this means is that my $response object contains just the results from the first bit of the searching.

How do I code LWP::UserAgent to create a $response object that has ALL the results without resubmitting the search??

Many thanks
Sam Seaver

Replies are listed 'Best First'.
Re: initial http response not complete
by sgifford (Prior) on Sep 20, 2004 at 22:01 UTC
    It depends on how it updates the page. If it uses a Refresh header or the equivalent, you can just keep reloading the URL in this header at the requested intervals. If it uses JavaScript, it's much harder; it's probably easiest to see what the JavaScript does, hack up something similar in Perl, and hope they don't change the code too much. There have been some threads in the past about actually interpreting JavaScript code from Perl, which you can find with SuperSearch.
Re: initial http response not complete
by gellyfish (Monsignor) on Sep 21, 2004 at 10:05 UTC

    It is possible that you are getting 'Chunked' transfer encoding - probably the best way of dealing with this is to use the callbacks to the request as in the second example in lwpcook

    Of course without knowing the actual source of the data it is difficult to know.

    /J\

      The actual URL is:

      http://wos4.isiknowledge.com/CIW.cgi

      however you have to go to the 'web of science' through the front page to create a session id (which is maintained in a hidden field in all the forms)

      I'm using 'General Search' and the keywords 'Emergence AND Complexity' in the topic field. This should return about 600 results but my initial response only has 62.

      Thanks
      Sam

      _____UPDATE______

      Ok, two things:

      I discovered from the request header that:

      HTTP/1.1 200 OK Connection: close Date: Tue, 21 Sep 2004 15:14:31 GMT Pragma: no-cache Server: Apache/1.3.29 (Unix) mod_perl/1.29 Perl/v5.6.1 Content-Type: text/html Client-Date: Tue, 21 Sep 2004 15:14:35 GMT Client-Response-Num: 1 Client-Transfer-Encoding: chunked Content-Base: http://wos4.isiknowledge.com:80/
      Which tells me that the response is 'chunked'. What exactly does this mean? Because I do get a full page for a response.

      Also, I checked out some JavaScript:

      http://wos4.isiknowledge.com/DynamicNum.js

      Which seems to have stuff to do with loading, but nothing that forces a reload...

      or is that what Im looking for?

      Thanks
      Sam

Re: initial http response not complete
by TedPride (Priest) on Sep 21, 2004 at 01:02 UTC
    What's the page url?
      Reply given below.