Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

www::mechanize GET not working for this URL 'https://ardsbc.erecruit.co.uk/'. Please can you tell me what could be the problem?

use strict; use WWW::Mechanize; my $url = "https://ardsbc.erecruit.co.uk/"; my $mech = WWW::Mechanize->new(); $mech->agent_alias('Mac Safari'); $mech->get($url); exit;

I tried changing request headers to what I get in n/w monitor(Firefox browser), but no use, still produces the same error(Error in Connecting to URL).

This is the response headers I am getting when I print the response string,

Content-Type: text/plain Client-Date: Wed, 27 Aug 2014 04:34:35 GMT Client-Warning: Internal response

I later checked the robots.txt of that site & I am fully aware of what it is written there, but just wanted to know why it is not GETting even if I am mimicking it using the same Firefox request headers!Why is Mechanize incapable of GETting that URL? Any clues?

Thanks a lot

Replies are listed 'Best First'.
Re: www::mechanize $mech->get(); not working for a particular URL!
by Corion (Patriarch) on Aug 27, 2014 at 11:31 UTC
    Error in Connecting to URL

    This means your network connection never reaches the remote site.

    Maybe your network has a proxy connection to the outside.

    I suggest you consult with your network administrator on how to best connect to outside websites.

      Thanks Corion. but I can access the site using other browsers (Chrome & Firefox).

        So, have you consulted with your network administrator what proxy servers are configured in your browsers so you can connect to the internet?

        If all else fails, consider using a tool such as Wireshark to find out how your browsers connect to the website and then replicate that method with WWW::Mechanize.

Re: www::mechanize $mech->get(); not working for a particular URL!
by Anonymous Monk on Aug 27, 2014 at 07:43 UTC
    Here's a clue, they don't want you scraping their site. Get over it.

      lol, Thank you for your advise, I expected this comment!! No problem.

      Now, leave the URL. If you know mechanize, then why the Mechanize script cannot mimic exactly as it is as of a Firefox browser if the same req headers were given(at-least for this URL). I don't understand that!

      Also, I expect people who know www::mechanize to comment on this thread, but Thanks again!

        I'm well versed in the module, many years experience, we just don't help spammers or rogue content scrapers earn a quick buck around here.