Baz has asked for the wisdom of the Perl Monks concerning the following question:

Hi everyone,
I'm trying to simulate a search using the search engine here
First of all, I have to get the engine and session ids - which is done by requesting the page linked to above, and then searching the source where these ids are embedded.
Second, I make a Search for all Griffin names in the "BT" postcode area. This should yield 3 pages of results - 50 on the first two pages, and 3 on the last. The code below manages to read the session and engine ids and use them to retrieve the first page of results - the second and third pages do not contain the search results as I would expect. ( The page has an error message, saying that the server is busy)
Just after writing the above, I changed ignore_discard to 0; this helped. For starters, I actually started thinking that I had got things working, but after I ran it the 4th time - back can the gremlins. Now, sometimes the 2nd page displays, sometimes not - on some occasions both pages 2 and 3 display. This is obviosly to do with time, but I dont understand it, as you can leave minute pauses between checking the next page when on a browser, and it works fine, every time. THe only time it doesn't, is when you leave it for about 1/2 a hour, or something like that.

Can anyone suggest what I'm doing wrong?

I asked this question a while back, but I got sick of trying to get it to work, so I've been doing other things since - heres a link to those discussions.
#!/usr/bin/perl -w use strict; use Data::Dumper; use HTML::TokeParser; use URI; use LWP::UserAgent; use HTTP::Request; use HTTP::Headers; use HTTP::Response; use HTTP::Cookies; use HTML::LinkExtor; use HTTP::Request::Common qw(GET POST); my $name = "Griffin"; my $WHATWORKS = "http://www.bt.co.uk/directory-enquiries/dq_home.jsp"; $WHATWORKS = URI->new($WHATWORKS); my $cookie_file = "cookies.txt"; my $cookie_jar = HTTP::Cookies->new( file => $cookie_file, autosave => 1, ignore_discard => 1, # IMPORTANT!!!!!!!!!!!! ); my $url_search; my $url_home = "http://www.bt.co.uk/directory-enquiries/dq_home.jsp"; my $ua = new LWP::UserAgent(); $ua->agent( "Mozilla/8.0(${^O};retmaspod)" ); $ua->cookie_jar( $cookie_jar ); ########################################################### ## Get Main Search Page - this page contains the engine and session id +s my $req = GET $url_home; my $res = $ua->request( $req ); open (LOG,">save.html"); my $fileOut = $res->content(); print LOG "$fileOut"; ########################################################### ## Get 1st Page of Results my %FORMOLA; ParseIt( \$res->{_content} ); # Get Ids $WHATWORKS->query_form( BV_EngineID => $FORMOLA{BV_EngineID}, BV_SessionID => $FORMOLA{BV_SessionID}, QRY => "res", new_search => "true", NAM => $name, PCD => "BT", limit => "50", CallingPage => "Homepage", STR => "", LOC => "", GIV => "" ); warn Dumper{ $WHATWORKS->query_form}; ########################################################### ## Get Sebsequent page for this name search $url_search = $WHATWORKS; $req = GET $url_search; $res = $ua->request($req); $fileOut = $res->content(); print LOG "$fileOut"; ## find how many page in this search by maching Page 1 of * if( $res->content() =~ /Page (\d+) of (\d+)/) { print "\nPages: $2"; } # for each remaining page for(my $i=1;$i<$2;$i++) { my $startId = $i*50; my $WHATWORKS2 = "http://www.bt.co.uk/directory-enquiries/dq_home.js +p?Homepage&start_id=25&lci=0&QRY=res&NAM=Griffin&PCD=BT"; print "\nhttp://www.bt.co.uk/directory-enquiries/dq_home.jsp?Homepage& +start_id=25&lci=0&QRY=res&NAM=Griffin&PCD=BT", $WHATWORKS2 = URI->new($WHATWORKS2); $url_search = $WHATWORKS2; $req = GET $url_search; $res = $ua->request($req); $fileOut = $res->content(); print LOG "$fileOut"; } sub ParseIt { my $p = new HTML::TokeParser( $_[0] ); while(my $t = $p->get_token() ) { my $ttype = shift @{ $t }; if($ttype eq "S") # start tag? { my($tag, $attr, $attrseq, $rawtxt) = @{ $t }; if($tag eq 'input' && $attr->{type} eq 'hidden' ) { $FORMOLA{ $attr->{name} } = $attr->{value} } } } }

Replies are listed 'Best First'.
Re: Cookies and Session Ids
by fglock (Vicar) on Aug 25, 2002 at 02:17 UTC

    Why not just retry when you get "server busy"? Wait a few seconds and redo the request.

      said like a true engineer ;)
        I seem to have insulted someone with that last comment - I'm an engineer myself BTW. And I always think that a good engineer is someone who thinks in terms of getting something to work, instead of worrying about the "why??" aspect of things - but thats just me.