praveenzx has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

So far , I have written few codes to crawl data from websites using WWW::Mechanize module. I had a requirement to crawl data from http://www.etfsecurities.com/en/etfscalculations/etfsmsl.aspx?region=us but dont know whats happening. its just a simple GET request and its not working. Can someone guide me to solve this issue.

#!/usr/bin/perl -w use strict; use WWW::Mechanize; use HTML::TreeBuilder; use Crypt::SSLeay; use Data::Dumper; my $mech =WWW::Mechanize->new(stack_depth=>12); $mech->agent_alias('Windows Mozilla'); $mech->get(qq{http://www.etfsecurities.com/en/etfscalculations/etfsmsl +.aspx?region=us}); print $mech->content(); exit;
The error we got while running the script is , Error GETing http://www.etfsecurities.com/en/etfscalculations/etfsmsl.aspx?region=us: Internal Server Error at script.pl line 11

Replies are listed 'Best First'.
Re: Issue With WWW::Mechanize
by Anonymous Monk on May 15, 2012 at 05:49 UTC
    Turn on logging, dump the full response
      Server Error in '/' Application. Object reference not set to an instance of an object. Description: An unhandled exception occurred during the execution of +the current web request. Please review the stack trace for more infor +mation about the error and where it originated in the code. Exception Details: System.NullReferenceException: Object reference no +t set to an instance of an object. Source Error: An unhandled exception was generated during the execution of the curr +ent web request. Information regarding the origin and location of the + exception can be identified using the exception stack trace below. Stack Trace: [NullReferenceException: Object reference not set to an instance of an + object.] Etfs.etfsMsl.getInternationalDoubleValue(String input, Int32 decima +lPlaces, Double divider) +53 Etfs.etfsMsl.CreateManualSecurities() +6181 Etfs.etfsMsl.Page_Load(Object sender, EventArgs e) +1361 System.Web.UI.Control.OnLoad(EventArgs e) +99 System.Web.UI.Control.LoadRecursive() +50 System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAs +yncPoint, Boolean includeStagesAfterAsyncPoint) +627 Version Information: Microsoft .NET Framework Version:2.0.50727.4952; +ASP.NET Version:2.0.50727.4955 This error page might contain sensitive information because ASP.NET is + configured to show verbose error messages using <customErrors mode=" +Off"/>. Consider using <customErrors mode="On"/> or <customErrors mod +e="RemoteOnly"/> in production environments.

        These are not the headers of the response.

        On the other hand, this message means that the remote server encountered an error. It does not expect to handle the data you gave to it. Talk to the administrator of the remote program to find out what goes wrong and what you can do to get the results you want.

Re: Issue With WWW::Mechanize
by praveenzx (Novice) on May 23, 2012 at 04:59 UTC
    Hi Monks, I have fixed the issue by adding some headers . Now its working fine.
    #!/usr/bin/perl -w use strict; use WWW::Mechanize; use HTML::TreeBuilder; use Data::Dumper; my $mech = WWW::Mechanize->new(); my $url = 'http://www.etfsecurities.com/en/etfscalculations/etfsmsl.as +px?region=us'; $mech->agent_alias('Windows Mozilla'); $mech->add_header('Host' => 'www.etfsecurities.com'); $mech->add_header('Accept' => 'text/html,application/xhtml+xml,applica +tion/xml;q=0.9,*/*;q=0.8'); $mech->add_header('Accept-Language' => 'en-us,en;q=0.5'); $mech->add_header('Accept-Encoding' => 'gzip, deflate'); $mech->add_header('Connection' => 'keep-alive'); $mech->add_header('Cache-Control' => 'private'); $mech->add_header('Content-Type' => 'text/html; charset=utf-8'); $mech->get($url); print $mech->content(); open(WRT,">praveenzx.htm"); print WRT $mech->content(); close(WRT);
    thanks for you advice, praveenzx~