Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?

POSTing information on a web page

by clone4 (Sexton)
on Jun 16, 2008 at 14:17 UTC ( #692261=perlquestion: print w/replies, xml ) Need Help??

clone4 has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks,
I started to write a script, which should log me in on one web page, the headers are as follows:

POST /index.php HTTP/1.1<br> Host:<br> User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv: Gecko +/20080208 Mandriva/ (2008.0) Firefox/<br +> Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9 +,text/plain;q=0.8,image/png,*/*;q=0.5 <br> Accept-Language: en-us,en;q=0.5<br> Accept-Encoding: gzip,deflate<br> Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 <br> Keep-Alive: 300<br> Connection: keep-alive<br> Referer:<br> Cookie: __utma=240219034.940153525.1209163011.1213625020.1213626434.79 +;<br> __utmb=240219034; fusion_visited=TRUE;<br> __utmz=240219034.120 +9310076.15.2.utmccn=(organic)|utmcsr=google|utmctr=site%3Awww.hellbou|utmcmd=organic;<br> PHPSESSID=rl6oeo664r7g9lnd7gin5ilvn3; __utmc=240219034<br> Content-Type: application/x-www-form-urlencoded<br> Content-Length: 47<br> user_name=***&user_pass=***&login=Login<br> <br> Respond<br> <br> HTTP/1.x 302 Found <br> Date: Mon, 16 Jun 2008 13:48:06 GMT<br> Server: Apache<br> X-Powered-By: PHP/5.0.4<br> Set-Cookie: PHPSESSID=rl6oeo664r7g9lnd7gin5ilvn3; path=/<br> Set-Cookie: fusion_user=24236.f31e0a8a2cefed417ec46c7675e4142d;<br> expires=Mon, 16 Jun 2008 16:48:06 GMT; path=/<br> Expires: Thu, 19 Nov 1981 08:52:00 GMT<br> Cache-Control: no-store, no-cache, must-revalidate,<br> post-check=0, +pre-check=0<br> Pragma: no-cache<br> Location: index.php<br> Content-Length: 0<br> Connection: close<br> Content-Type: text/html<br>
For that I used this code:
use LWP::UserAgent; use HTTP::Cookies; $agent = LWP::UserAgent->new; $agent->agent('Mozilla/5.0 (X11; U; Linux i686; en-US; rv: Ge +cko/0000000000 Mandriva/ (2008.0) Firefox/2.0.0. +13'); $url = ''; my $request = HTTP::Request->new(POST => $url); $request->content_type("application/x-www-form-urlencoded"); $request->content('user_name=***&user_pass=***&login=Login'); $request->content_length(47); $request->referer(''); $respond = $agent->request($request); print "this is request ".$request->as_string()."\n"; print "this is respond".$respond->as_string()."\n"; my $request = HTTP::Request->new(GET=>$url); $response = $agent->request($request); if ( $response->is_success) { print $request->as_string(); print $response->content; die; }
But then the GET request returns page, where I'm not logged in... I got the respond header printed out, and it seems correct, so I've got no idea what's wrong. However strange is that normally this code returns also the source of the page ( withouth the second GET request ), but it only returns the header.

Then I tried the WWW::Mechanize module, but it doesn't work either, because the site needs you to have a valid referer, when you're logging in, and I can't find any way how to edit the WWW::Mechanize requests.... Here is the code of that script:
use WWW::Mechanize; use HTML::Form; use LWP::UserAgent; my $agent = LWP::UserAgent->new; my $url = ''; my $response = $agent->get($url); my $mech = WWW::Mechanize->new ; my @forms = HTML::Form->parse($response); my $username = $forms[1]->find_input("user_name","text"); my $password = $forms[1]->find_input("user_pass","password"); $username->value("***"); $password->value("***"); my $filled_out_request = $forms[1]->click; $response = $agent->request($filled_out_request); print $response->content;

And this is the source of the form(it's 2nd form on the page) :

<form id='loginform' method='post' action='index.php'> <div style="text-align: center;"> <input type='text' name='user_name' class='textbox' style='width:100px +' /><br /> <input type='password' name='user_pass' class='textbox' style='width:1 +00px' /><br /> <input type='checkbox' name='remember_me' value='y' />Remember Me<br / +><br /> <input type='submit' name='login' value='Login' class='button' /><br / +> </div> </form><br >
Sorry for the lenght of this post, I just wanted to make sure, everything is included...
Thanks for any help

Replies are listed 'Best First'.
Re: POSTing information on a web page
by Corion (Patriarch) on Jun 16, 2008 at 14:25 UTC

    I think you're going a quite roundabout way to filling in the form using WWW::Mechanize. WWW::Mechanize itself takes care of sending a correct Referer header, and you can fill in a form directly if you know the fields that are on it:

    # Select the login form $mech->form_with_fields('user_name','user_pass'); $mech->set_fields( user_name => 'username', user_pass => 'secr1t' ); $mech->click('login'); print $mech->content;

      And if you'd need a different referer header, why not add it yourself? How to do that is in the POD.

      Update: as Corion pointed out, using for links isn't really nice, but since the real URL doesn't go through the Perlmonks parser, I don't really have an other option. The link is going to CPAN and yes, SFW.

      yeah I know, but still can't get it working, even with your code snippet, it says it can't find any form including these fields,which probably means I don't pass the page correctly.I guess I gotta read up more on how to correctly use this module, because it still makes me quite confused
Re: POSTing information on a web page
by tachyon-II (Chaplain) on Jun 16, 2008 at 15:04 UTC

    You need to send the session cookie(s) back with every page request. If you look at the response you will see your login tokens:

    Set-Cookie: PHPSESSID=rl6oeo664r7g9lnd7gin5ilvn3; path=/ Set-Cookie: fusion_user=24236.f31e0a8a2cefed417ec46c7675e4142d; expires=Mon, 16 Jun 2008 16:48:06 GMT; path=/

    These tokens are how the server knows you logged in. Just set up a HTTP::Cookies cookie jar and it will work fine. If you want the executive answer (using your first example):

    use LWP::UserAgent; use HTTP::Cookies; # not actually required as LWP uses it, # so you don't need to use it again.... $agent = LWP::UserAgent->new; $agent->cookie_jar( {} ); # temporary cookie jar $agent->agent( [snip]

      ... or just use WWW::Mechanize, which also does the cookie handling transparently.

      thanks a lot, didn't realise this, well I have to get my head around how to use the module correctly, I guess I will have to get the cookie from the respond using regex, and then pass it before GETing the page again... my god, that will be long night :)

        You're still thinking far too low-level about things. Read the WWW::Mechanize documentation. It does the cookie handling for you completely. You can even save the cookie jar to disk so you don't need to log in again the next time your script runs.

        If you insist on doing things yourself, consider using HTTP::Cookies instead of extracting the cookie information yourself. But then, LWP::UserAgent already does that for you.

        No No No. For a start modules have methods to do stuff: ie $lwp->cookie_jar->extract_cookies($response) However LWP takes care of this for you. *All you need to do is setup the cookie jar as shown*. The cookies you are sent will automatically be included in the header sent by LWP as appropriate. Note that in LWP the cookie_jar() method is actually just a reference to an HTTP::Cookies object so you can call any HTTP::Cookies method on $lwp->cookie_jar. Although you don't generally need to do it WWW::Mechanize is a subclass of LWP::UserAgent so (more or less) anything you can do with an LWP object (in terms of calling methods) you can do with your Mech object (unless that method got overridden).

        If you are "stopping and starting" your program you can avoid logging in every time you run the program by saving the cookie to disk. See the docs, read them, and understand them.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://692261]
Approved by Corion
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (4)
As of 2022-07-04 12:23 GMT
Find Nodes?
    Voting Booth?

    No recent polls found