YarNik has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks,

I'm trying to get a page, but it's blocked by Cloudflare.

Maybe I need to keep the connection open for 5 seconds, but how do I do this?

I tried to use `keep_alive => 1`, but how to set the time?

use HTTP::Cookies; my $cookie_jar = HTTP::Cookies->new( file => "lwp_cookies.dat", autosave => 1, ); use LWP; my $browser = LWP::UserAgent->new( keep_alive => 1, ); $browser->cookie_jar($cookie_jar); my $response = $browser->get('https://bittrex.com/'); if ($response->is_success) { print "1:" . $response->content; } else { print "2:" . $response->status_line; }

Replies are listed 'Best First'.
Re: LWP and Cloudflare
by marto (Cardinal) on Nov 13, 2017 at 15:35 UTC

    Why aren't you using their api?

      I'm using API, now I need access to the page.

        That's against their terms, the API docs linked to state "If you have any questions, feedback or recommendation for API support you can post a question in our support center". If there's something you want that the API does not do, ask them for it.

Re: LWP and Cloudflare
by holli (Abbot) on Nov 13, 2017 at 16:50 UTC
    Try setting the User-Agent string to something a real browser sends, e.g. Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36.

    That's usually enough to get around such "blockades".

    Edit: Tried that, getting a 503 now. I also noticed the site is protected by Cloudflare. Now you enter dangerous waters.

    While Terms of Use apply only to users that agreed to the terms (read: who are logged in), it's ok to scrape a site even if they don't want you too. Bypassing cloudflare and similar measures however is outright hacking and illegal in most places.


    holli

    You can lead your users to water, but alas, you cannot drown them.