Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: Log In To guardian.co.uk with WWW::Mechanize

by Adrade (Pilgrim)
on May 31, 2005 at 03:47 UTC ( [id://461916]=note: print w/replies, xml ) Need Help??


in reply to Log In To guardian.co.uk with WWW::Mechanize

What seems to be the problem to me is that I think the http standard calls for Location: redirected requests to be in the same method as the original call (if a POSTed page redirects to another, that page should also be POSTed) - this, I think, is what WWW::Mechanize does - but not what Firefox and other standard browsers are doing, a behavior that the site developers are taking into account (even though they should be using a Status: 303 See Other, not a 301)... What you want to do is load up the cookie_jar with the authentication information, then request the particular pages you're looking for - you're falling to an error because autocheck is set to on, and when Mechanize POSTs to a page that expects a GET, it checks to see if it worked, realizes that it didnt, and all goes ka-ploowey. So... what you need to do, is authenticate yourself, like you wonderfully did (but with autocheck off)... then go ahead an request the user-particular webpage from which you wish to pull data: for instance, this modification of your code will authenticate you, and then pull up the 'mydetails' page:

use WWW::Mechanize; my $browser = WWW::Mechanize->new( cookie_jar => {}, autocheck => 0 ); $browser->get( 'http://users.guardian.co.uk/signin/0,12930,-1,00.html' ); $browser->form_name('regpss1') || die "$!"; $browser->set_fields( AU_LOGIN_ID => 'your email', AU_PASSWORD => 'your password' ); $browser->submit() || die "$!"; $browser->{autocheck} = 1; $browser->get('http://users.guardian.co.uk/mydetails/'); print $browser->content();

Now, there's no reason to parse all that funky javascript - lots of folks have js turned off in their browsers - if the guardian didnt allow these people to browse their site, they would be losing a good portion of their readers - all that js hashing is for added security, but isn't required - as the above code demonstrates.

I hope this helps - I mean given your request, I think this is what you're looking for. And you should give yourself a pat on the back - you were like 98% right already!

Best,
  -Adam

Replies are listed 'Best First'.
Re^2: Log In To guardian.co.uk with WWW::Mechanize
by Cody Pendant (Prior) on May 31, 2005 at 05:53 UTC
    Thanks for that. Interesting stuff, and thanks for the encouragement. I promise not to lose the code and come back and ask again in another two years.


    ($_='kkvvttuu bbooppuuiiffss qqffssmm iibbddllffss')
    =~y~b-v~a-z~s; print

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://461916]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (3)
As of 2024-04-19 22:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found