udaybhaskar has asked for the wisdom of the Perl Monks concerning the following question:

I guess this is more a question of how to get around a website blocking LWP::UserAgent. I have been using this code for a while now
#!/usr/bin/perl -w use diagnostics; use strict; use LWP; use Date::Manip; use HTTP::Cookies; use URI; my $cookie_jar; $cookie_jar = HTTP::Cookies->new( 'file' => 'cookies.lwp', 'autosave' => 1, ); my $url = URI->new ('http://www.nseindia.com/marketinfo/indices/indexwatch.jsp'); my $browser = LWP::UserAgent->new(timeout=>'45', agent=>'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.11) G +ecko/20101012 Firefox/3.6.11'); $browser->cookie_jar($cookie_jar); push @{ $browser->requests_redirectable }, 'POST'; my $html_page; my $response = $browser->get($url); if ($response->is_error()) { print "error in getting index ".$response->status_line()."\n"; print "try again...\n"; } else { $html_page = $response->content(); if ($html_page =~ m{<a href=#top>Top</a></center>}) { print "got Index\n"; } elsif ($html_page =~ m/Your request could not be processed/) { print "\t\t\tNSE is down. Trying again...\n\n"; } else { print "$html_page\n"; print "failed to get index. Trying again...\n"; } }

This code worked till yesterday. However today I am hitting the 403 error.

error in getting index 403 Forbidden try again...
I used the same UserAgent as that of Mozilla which successfully retrieves the page. I tried it from different IP address to ensure that my IP is not blocked(I run it once a day and am not running a bot) Looking for some help in getting this going again. Thanks Uday

Replies are listed 'Best First'.
Re: Sudden problems with LWP::UserAgent
by Corion (Patriarch) on Oct 27, 2010 at 07:08 UTC

    HTTP error 403, "Forbidden" - this should tell you that the website administrator does not want you visiting anymore.

      Their user agreement does not forbid auto retrival of quotes. http://www.nse-india.com/disclaimer.htm I am using it for a legitimate purpose. Thanks Uday

        This is not an issue where Perl is of relevance. You are violating the web sites terms of use. They block you.

        It does not matter whether you claim you have a "legitimate purpose", and Perl does not come into it. Talk to them, most likely you can pay for better access to their data.

        That's a disclaimer, not a user agreement!! Maybe they do not mind interactive use but do not want scripts hitting it. That's their call to make, just like they can cut you off if they want to. I think Corion has called this one correctly.

        Elda Taluta; Sarks Sark; Ark Arks

Re: Sudden problems with LWP::UserAgent
by morgon (Priest) on Oct 27, 2010 at 14:35 UTC
    Have you tried acessing the page with a browser?

    Could it be that you need to get a new cookie (I think you are storing and reusing cookies).

    Have you tried setting the user-agent to that of the google-bot?

      I did not try google bot. But I did try RobotUA of LWP and that didnt help either. I will investigate further and update once successful.

      Thanks Uday

        I got it working again. Looks like they now expect
        'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/* +;q=0.8',

        as a header field. I however am plugging in all the http headers that firefox uses for my pages of interest so that I don't end in this trouble again.

        Thanks and Regards
        Uday
A reply falls below the community's threshold of quality. You may see it by logging in.