Baz has asked for the wisdom of the Perl Monks concerning the following question:

I'm attempting to fetch data from a webpage using get. The webpage in question allows people to make only 10 views of a page per day, but when I attempt to fetch the page using get, i recieve a message informing me that all my views are used up for today, despite the fact that this was my first attempt at viewing the page.
At this point, presuming that the problem was a cookie one, I went ahead and set up a cookie jar for my perl program - but this didnt solve my problem.

Just now I tried loading the same page using lynx(from the same server as the calling script) and found that I recieved my full quota of 10 pages even when I rejected the cookies.
Where does this leave me?? - what have the browsers got (IE6, Netscape6 and lynx) that my perl program doesn't.
Also, if I'm down to my last view for today in IE6, all I have to do is copy the url into netscape and I have ten views again - so the two browsers are keeping track individually. But, I can also open a second IE6 window, paste in the url and again I have 10 views for that window...
Anyway, help appreciated,
Barry.

Replies are listed 'Best First'.
Re: Fetching Web Pages using
by crenz (Priest) on Aug 02, 2002 at 04:12 UTC

    I hope what you're trying to do is still ethical...

    Some web pages check all kind of parameters. I once had the problem that I wanted people to be able to send me SMS via e-mail. I used a perl script that read the e-mail and access my phone company's website to fill in a form to submit the SMS. I started out being honest, giving a nice and true user-agent name (with my e-mail adress) etc., but made more and more changes until the site accepted my submissions.

    Well, to cut this short:

    my $agentname = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)'; my $ua = new LWP::UserAgent; $ua->agent($agentname); my $request = GET $send_url; # Fake IE... $request->header('Accept' => 'image/gif, image/x-xbitmap, image/jp +eg, image/pjpe g, application/vnd.ms-powerpoint, application/vnd.ms-excel, applicatio +n/msword, */*'); $request->header('Accept-Language' => 'en-us'); $request->header('Referer' => $referer); my $response = $ua->request($request);

    It seems likely that you need to accept  Cookies as well. Take a look at what you are getting -- maybe you need to reaccess the page because you got redirected together with a cookie:

    if ($res->is_redirect()) { my $loc = $res->header('Location'); # create new request like above # and reaccess site }
      Hi again, the web page in question is Here

      What I'm trying to do is get an approximation of the popularity of a surname in a particular area, and I dont plan to use any more than the 10 searches i'm allocated per day - I might even get away with one search per surname. I have the BT CD(95 pounds) but I cant use it for this purpose, and I've contacted BT and the webmaster and I've got no reply...

      Heres the code I'm using...
      #!/usr/bin/perl #-Tw use lib '/home/baz/public_html'; use strict; $| = 1; use CGI::Carp "fatalsToBrowser"; use CGI ":all"; use DBI; use LWP::Simple; use LWP::UserAgent; use HTML::TokeParser; use MyVars qw($footer); use HTTP::Cookies; my $ua = LWP::UserAgent->new; $ua->agent("Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"); my $query = new CGI; $ua->cookie_jar(HTTP::Cookies->new(file => "lwpcookies.txt",autosave = +>1)); print $query->header("Ver 1.1"), print start_html; my $req = HTTP::Request->new(GET=>"http://www.bt.co.uk/directory-enquiries/dq_ho +me.jsp"); my $res = $ua->request($req); open (LOG, ">>res.html"); print LOG "$res->content";

        Okay, I've got a working search. Let me describe what I did, I think it is a generally useful learning experience.

        First, I looked at the web page. I decided to not preoccupy myself with how to view it using perl, but rather to try to submit a search and get some results.

        The source code shows that the form submit is caught by JavaScript and validated. Fair enough. I look out for lines like

        document.dqform.action="/directory-enquiries/dq_locationfinder.jsp";

        and also for submission buttons (there are none) -- and change the action to a test script. In this case, it's my trusty http://www.web42.com/cgi-bin/test.cgi. Nothing special, but effective for this problem.

        I have to admit this is lazy: I make no effort to understand the (hard to read and longish) HTML source, but rather load the page in my browser, enter the desired values, submit it and let my script show what happened ;-). See the result on the results page.

        I create a simple script to submit the form using the above variables. It works, but the HTML page contains a warning that my connection expired. Now, "expired connections" always point to some persistant variables, like cookies (which I didn't even enable) -- or session IDs. We have two of these IDs in the variable list of the results mentioned above.

        So I just insert another request to first fetch the search page. Then I search it for the two IDs and use them to submit the search. Voilà!

        Still, there are some caveats. You can play with the limits variable, but there seems to be a limit set by the server (50). For that, you'll need to do follow-up requests.

        Cool! thats interesting. Let me know when you start getting some results....Oh! and sorry I can't be of any help.
Re: Fetching Web Pages using
by belkajm (Beadle) on Aug 02, 2002 at 04:07 UTC

    could you post the code you're using? also, is the webpage publically available? if so, could you post the url?

    Jody