in reply to Fetching Web Pages using get
in thread Fetching Web Pages using

Okay, I've got a working search. Let me describe what I did, I think it is a generally useful learning experience.

First, I looked at the web page. I decided to not preoccupy myself with how to view it using perl, but rather to try to submit a search and get some results.

The source code shows that the form submit is caught by JavaScript and validated. Fair enough. I look out for lines like

document.dqform.action="/directory-enquiries/dq_locationfinder.jsp";

and also for submission buttons (there are none) -- and change the action to a test script. In this case, it's my trusty http://www.web42.com/cgi-bin/test.cgi. Nothing special, but effective for this problem.

I have to admit this is lazy: I make no effort to understand the (hard to read and longish) HTML source, but rather load the page in my browser, enter the desired values, submit it and let my script show what happened ;-). See the result on the results page.

I create a simple script to submit the form using the above variables. It works, but the HTML page contains a warning that my connection expired. Now, "expired connections" always point to some persistant variables, like cookies (which I didn't even enable) -- or session IDs. We have two of these IDs in the variable list of the results mentioned above.

So I just insert another request to first fetch the search page. Then I search it for the two IDs and use them to submit the search. Voilà!

Still, there are some caveats. You can play with the limits variable, but there seems to be a limit set by the server (50). For that, you'll need to do follow-up requests.

Here's the source code I used:

#!/usr/bin/perl -w use strict; use warnings; use HTTP::Request::Common qw(GET POST); use LWP::UserAgent; my $url_home = "http://www.bt.co.uk/directory-enquiries/dq_home.jsp"; my $url_search = "http://www.bt.co.uk/directory-enquiries/dq_locationfinder.jsp +"; my $ua = new LWP::UserAgent(); # Get a session ID first my $req = GET $url_home; my $res = $ua->request($req); die $res->as_string() . "\n" if $res->is_error(); die "Can't find a session ID!\n" unless ($res->as_string() =~ /BV_SessionID=([^&]+)\&/); my $sessID = $1; die "Can't find an engine ID!\n" unless ($res->as_string() =~ /BV_EngineID=([^&]+)\&/); my $engID = $1; print STDERR "Got session ID $sessID\n"; # too lazy for urlencode... $sessID =~ s/\@/%40/g; my $request = POST $url_search, [ QRY => 'res', BV_SessionID => $sessID, BV_EngineID => $engID, new_search => 'true', NAM => 'Roberts', GIV => '', LOC => 'london', STR => '', PCD => '', limit => '25', CallingPage => 'Homepage', ]; my $response = $ua->request($request); print $response->as_string();