Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am writing a web client script that implements the
following scenario that is done using a browser:
1. set url to a sites login page (say to http://www.someplace.com/login.asp
for example)
2. Type in a valid email address and password in a form
3. click the submit button
4. the browser goes to the "home" page which means I'm logged in
(in this example http://www.someplace.com/home.asp )
5. once here I do other stuff - more forms filled out and submit buttons
clicked, etc.

So far the script only manages to to get to step 3.2
(not even halfway to step 4!). The login works (or seems to).

The steps in the script are basically this:

1. get a UserAgent object
2. Create a request object with the POST method and the url.
3. make a request with the agent and get a response
4. parse the content and get a form object
5. set the email and password in the form object with the valid values
6. create a new request by "clicking" the submit button like this:
my $new_req = $form->click;
7. have the agent make another request using the request created in step 6
and get the response
8.PROBLEM IS HERE - now I want to go the the http://www.someplace.com/home.asp
url, and try to do so
9. create a third request object using the above url (I also tried setting
the url using $response->header('location') and it does not work either)
with POST
10. have the agent send the request and get the response.
11. end result a code of 302, the "Object moved" title, and an
href="home.asp", not the page I wanted!

On reading the available doc (or at least what I could find)
my understanding is that each resquest/response pair are stateless.
Does this mean that the server is firing up a new instance of the
cgi script for each request and when the response comes back,
the cgi script ends? If so, the connection is broken.
How can I acheive persistence such that I can keep the same
instance of the cgi script alive while MY web client script is alive?
This way, it will remember that I'm already logged in. (I'd love a module
called LWP::UserAgent::Persistent !!) Am I thinking of this problem the right way?
I'm new to using modules in perl, and not very experienced
with web programming. I'm comfortable enough with perl to parse out the data I
want once I can get the right page coming back to me.

Am I not handling the redirection correctly? I know I want to follow
the link "home.asp" that is returned (this is how the browser works).
Why doesn't the server send me the same page as the browser when I log in?

Any help appreciated, I'm basically stuck here and don't know what
to try next. This MUST be possible!

My apologies if you have seen this question before.

Thanks
Casey

  • Comment on Getting past login screen in web client

Replies are listed 'Best First'.
Re: Getting past login screen in web client
by atcroft (Abbot) on Apr 17, 2002 at 23:39 UTC

    I hope others more knowledgable will reply, but upon reading, I would offer the following possibilities:

    • the server may be using named-based virtual hosting, which would (I believe) require a HTTP/1.1 header with a "Host: sitename" key-value pair prior to the request, and
    • the page/site may be using cookies for maintaining state and requesting that information further on

    I would recommend chapter 20, "Web Automation," in Perl Cookbook, or Web Client Programming with Perl (the latter I understand is no longer in print, although there may be some information regarding on the O'Reilly website). Some of the links here, especially in the Outside Links section, may also be of assistance.

    I hope this helps somewhat, and good luck.

      Since Web Client Programming with Perl is now out of print, O'Reilly has provided the book online here.