Okay, I've got a working search. Let me describe what I did, I think it is a generally useful learning experience.

First, I looked at the web page. I decided to not preoccupy myself with how to view it using perl, but rather to try to submit a search and get some results.

The source code shows that the form submit is caught by JavaScript and validated. Fair enough. I look out for lines like

document.dqform.action="/directory-enquiries/dq_locationfinder.jsp";

and also for submission buttons (there are none) -- and change the action to a test script. In this case, it's my trusty http://www.web42.com/cgi-bin/test.cgi. Nothing special, but effective for this problem.

I have to admit this is lazy: I make no effort to understand the (hard to read and longish) HTML source, but rather load the page in my browser, enter the desired values, submit it and let my script show what happened ;-). See the result on the results page.

I create a simple script to submit the form using the above variables. It works, but the HTML page contains a warning that my connection expired. Now, "expired connections" always point to some persistant variables, like cookies (which I didn't even enable) -- or session IDs. We have two of these IDs in the variable list of the results mentioned above.

So I just insert another request to first fetch the search page. Then I search it for the two IDs and use them to submit the search. Voilà!

Still, there are some caveats. You can play with the limits variable, but there seems to be a limit set by the server (50). For that, you'll need to do follow-up requests.

Here's the source code I used:

#!/usr/bin/perl -w use strict; use warnings; use HTTP::Request::Common qw(GET POST); use LWP::UserAgent; my $url_home = "http://www.bt.co.uk/directory-enquiries/dq_home.jsp"; my $url_search = "http://www.bt.co.uk/directory-enquiries/dq_locationfinder.jsp +"; my $ua = new LWP::UserAgent(); # Get a session ID first my $req = GET $url_home; my $res = $ua->request($req); die $res->as_string() . "\n" if $res->is_error(); die "Can't find a session ID!\n" unless ($res->as_string() =~ /BV_SessionID=([^&]+)\&/); my $sessID = $1; die "Can't find an engine ID!\n" unless ($res->as_string() =~ /BV_EngineID=([^&]+)\&/); my $engID = $1; print STDERR "Got session ID $sessID\n"; # too lazy for urlencode... $sessID =~ s/\@/%40/g; my $request = POST $url_search, [ QRY => 'res', BV_SessionID => $sessID, BV_EngineID => $engID, new_search => 'true', NAM => 'Roberts', GIV => '', LOC => 'london', STR => '', PCD => '', limit => '25', CallingPage => 'Homepage', ]; my $response = $ua->request($request); print $response->as_string();

In reply to Re: Fetching Web Pages using get by crenz
in thread Fetching Web Pages using by Baz

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.