in reply to Anyone know why I can't scrape this page?

Regarding the first update: Thought you meant a 500 Internal Server error at first (very confusing :P). From the Mechanize FAQ:

My Mech program gets these 500 errors. A 500 error from the web server says that the program on the server side died. Probably the web server program was expecting certain inputs that you didn't supply, and instead of handling it nicely, the program died. Whatever the cause of the 500 error, if it works in the browser, but not in your Mech program, you're not acting like the browser.

It takes a helluva long time to connect to the site using a normal browser, so I do not think the problem is your script. Either that or the javascript on the page is severely interfering with Mech. If it is the latter case, I suggest WWW:Selenium (I heard it knows how to interpret javascript. Take the suggestion with a grain of sugar.)

Regarding the second update: Post thine code.

I'm so adjective, I verb nouns!

chomp; # nom nom nom

  • Comment on Re: Anyone know why I can't scrape this page?

Replies are listed 'Best First'.
Re^2: Anyone know why I can't scrape this page?
by lv211 (Beadle) on Sep 07, 2008 at 17:48 UTC
    Here is the code. I know the problem happens when I try to pass the variable from the array to the subroutine. Do you know how I can pass the variable from the array into the subroutine with single quotes?

    I was thinking it would look something like my @game = qr(@_); but I couldn't get that to work or find documentation that could answer my question.

    #!/usr/bin/perl use WWW::Mechanize; #use strict; ### Create the Bot and set the Variables my $mech = WWW::Mechanize->new; my $url = 'http://www.vegasinsider.com/nfl/odds/las-vegas/line-movemen +t/bengals-@-ravens.cfm/date/9-07-08/time/1300#J'; save_file ($url); #### sub save_file { my $mech = WWW::Mechanize->new; $mech->timeout(60); my @game = @_; foreach (@game){ print "$_\n"; $_ =~ m{http://www.vegasinsider.com/(.*?)/odds/(.*?)/line- +movement/(.*?)-@-(.*?).cfm/date/(.*?)/time/}; print "$1 $2 $3 $4 $5\n"; my $filename = 'C:\Documents and Settings\Owner\Desktop\VI + Data\sub.html'; print "Getting $filename\n"; $mech->get( "$_", ":content_file" => $filename ) or die "C +an't get url"; print $mech->status; my $data = $mech->content; print " ", -s $filename, " bytes\n"; print $data; } } ## my $file = 'C:\Documents and Settings\Owner\Desktop\VI Data\ne +w.html'; $mech->timeout(60); $mech->get($url, ":content_file" => $file) or die "Can't get u +rl"; print $mech->status; my $data = $mech->content; #print " ", -s $filename, " bytes\n"; print $data;

      At first glance, let me suggest that you uncomment use strict; and also make sure you use warnings; (You can also use warnings by placing -w at the end of your hashbang line: #!/usr/bin/perl -w).

      I know the problem happens when I try to pass the variable from the array to the subroutine.

      You are passing a scalar to the subroutine and then assigning that scalar to an array. You are not passing an array to the subroutine. I'll try to explain:

      my $url = 'http://www.vegasinsider.com/nfl/odds/.../1300#J'; # Assigni +ng that url to the scalar url. save_file ($url); # Calling the subroutine while passing the scalar $u +rl sub save_file { # Initiating sub my @game = @_; # Populating an array with all the contents of the + arguments that are passed to the subroutine. In this case, just one; + $url

      I assume you are going to pass multiple urls to the subroutine. Anyway, continuing on.

      Do you know how I can pass the variable from the array into the subroutine with single quotes?

      See my above explanation. I am confused as to what you mean here. Could you outline what you are trying to do?

      I'm so adjective, I verb nouns!

      chomp; # nom nom nom

        Basically the program is going to pull the line movement data from the website. I have another portion of the program which harvests the links off the site and puts them into an array. For example, if I'm looking at NFL odd in vegas, it will go to the nfl odds in vegas page, and pull all the links to the games there.

        From this array I want to use a subroutine (using WWW::Mechanize) to go to the links in the array and download the page to my computer.

        I think the problem I have stems from the fact that there is an @ sign in the url. When I test the script with a variable with single quotes, I'm able to get the page. When I try to use the subroutine I get an error. 400 bad request