epoch4life has asked for the wisdom of the Perl Monks concerning the following question:
I have in the past successfully used LWP::Useragent to crawl web sites in http or https to download data, but just can't make it work anymore with this new server. I feel this might be due to the server trying to refuse useragents deliberately. I read some post in this thread which suggested to impersonate a Firefox request. I tried my best in this regard all to no avail.
More specifically, I'm trying to use LWP useragent to automate the collection of data from a site, but was always get refused with this message
500 Can't connect to tutorialregistration.uws.edu.au:443 (SSL connect +attempt failed because of handshake problemserror:00000000:lib(0):fun +c(0):reason(0))
I have narrowed it down to accessing just the URL https://tutorialregistration.uws.edu.au/aplus/admin/adminLogin.do which I can directly access from a browser, but failed with the above message when using LWP Useragent.
This can be shown via
perl -MLWP::Simple -e "getprint 'https://tutorialregistration.uws.edu. +au/aplus/admin/adminLogin.do'"
or
use LWP::UserAgent; $ua = new LWP::UserAgent; $req = new HTTP::Request 'GET' => 'https://tutorialregistration.uws.edu.au/aplus/admin/adminLogin.do'; # impersonate a firefox brower $ua->agent("Mozilla/5.0 (Windows NT 6.1; rv:29.0) Gecko/20100101 Firef +ox/29.0"); $req->header( 'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/* +;q=0.8', 'Accept-Language' => 'en-US,en;q=0.5', 'Accept-Encoding' => 'gzip, deflate', 'Cookie' => '', 'Referer' => 'https://www.uws.edu.au/', 'Connection' => 'keep-alive', ); $res = $ua->request($req); print "content-type:text/html\n\n"; print $res->content;
In both cases, if I replace the webpage URL by another https page (inside or outside Intranet), they both work fine. I really can't figure out what has gone wrong here. Please help. Many thanks.
David
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Request by LWP Useragent refused by the web server but not by others
by Khen1950fx (Canon) on Jun 10, 2014 at 07:33 UTC | |
by epoch4life (Initiate) on Jun 11, 2014 at 04:02 UTC | |
|
Re: Request by LWP Useragent refused by the web server but not by others (ssl handshake problem)
by Anonymous Monk on Jun 10, 2014 at 07:26 UTC | |
|
Re: Request by LWP Useragent refused by the web server but not by others
by locked_user sundialsvc4 (Abbot) on Jun 10, 2014 at 14:35 UTC | |
by epoch4life (Initiate) on Jun 11, 2014 at 04:04 UTC |