m4merg has asked for the wisdom of the Perl Monks concerning the following question:

Hi, Monks!

I'm trying to use LWP::UserAgent to get content from URL. I've succeed, but after some time URL just banned my IP (at least i think that they did, because I could not access to site at all). So i decided to use proxy, but i still can't get to this URL. Here is the code im using for this:

my $ua = new LWP::UserAgent; my $proxy = "http://112.137.164.232:3128"; $ua->proxy(['https'], $proxy); $ua->agent('Mozilla/5.0'); my $links = "http://sci-hub.org/"; my $req = new HTTP::Request GET => $links; my $stuff = $ua->request($req); print "Content-type: text/html\n\n"; print $stuff->content;

Using this code i can get content from another URL (for ex. google.com), but cant get it from the desired one (sci-hub.org). Also i actually can get access to the desired URL through web browser using this proxy (though i can't without proxy as they banned me), but can't get it from perl script:

Can't connect to sci-hub.org:80 (Connection timed out) LWP::Protocol::http::Socket: connect: Connection timed out at /usr/sha +re/perl5/LWP/Protocol/http.pm line 41

So, proxy is correct in general and code is correct in general (at least i think so), but for the desired URL it doesnt work. So what is the problem?

Sorry if I duplicate another topic, but i've searched for the solution of my problem hard (on this site also), used a lot of different ways and didnt find anything usefull at the end and thanks in advance for any help.

Replies are listed 'Best First'.
Re: Another one perl LWP question
by marto (Cardinal) on Jul 03, 2015 at 11:33 UTC

    Wouldn't it make more sense to discuss the issue with the site owners? Perhaps they have an API available rather than scraping data? From their splash page, it sounds like something they might be open to:

    "The Sci-Hub project works to fight inequality in information access across the world. The goal is to dismantle all barriers to knowledge distribution. Our vision is the world without paywalls, where any piece of knowledge can be Accessed freely by any person."

      A site which believes the information should be available to anyone has banned you, m4merg?

      You must have pulled a doozie.

        From time to time people post here reporting problems which have turned out to be an intrusion_detection_system kicking in, without human intervention.

      That's definitely good idea, but the problem is that they dont have a well organised support. I wrote them and didnt get any reply. And i doubt they have any API, though i still that they do and that they will reply me. But in the same time i'm trying to solve problem on my own

Re: Another one perl LWP question
by hippo (Archbishop) on Jul 03, 2015 at 11:39 UTC

    marto is right - you should be working with the site not against it. Otherwise you will get the proxy banned and then you'll have 2 lots of people annoyed with you and still no access.

    but for the desired URL it doesnt work.

    Well since you have not said in which way it "doesnt work" it's pretty much impossible to debug, wouldn't you agree?

      Sorry for the lack of details, i've updated initial post</p?

        Great - so it's a connection timeout. On the assumption that this is what you also see when you explicitly don't use the proxy then the conclusion would be that you are still not using the proxy to access this URL, albeit accidentally.

        The suspicion must be that this line

        $ua->proxy(['https'], $proxy);

        should not have 'https' as the first argument since you are not trying to access an https URL. Why have you chosen 'https' here?

Re: Another one perl LWP question
by marinersk (Priest) on Jul 03, 2015 at 11:40 UTC

    Referencing Re^3: proxy problwm LWP, which shows a working snippet, suggests paying detailed attention to where you specify 'http' and where 'https' in your various calls.

    Also, I've found that the trailing slash is a nit-picky detail which sometimes is critical.

      Thanks for answer, i've changed 'https' to 'http', 'ftp' at this line $ua->proxy('https', $proxy);

      and it worked!!! But why 'https' works for the http://www.example.com/bar though it doesnt work for the desired one URL. As i understand this is connection to proxy but not to the web site. I've changed the connection to proxy (which was actually working for the example site) without changing access to the sci-hub and it worked. I can't see any logic here - what the problem?

        In this case, httpworks and httpsdoesn't because the proxycall is establishing which kinds of connections it is supposed to handle for you, and you are making an httpcall.

        At a guess to your other example, perhaps that one is getting converted to an httpscall behind the scenes, thus enabling your proxycall to handle it?.