in reply to Re: Anyone know why I can't scrape this page?
in thread Anyone know why I can't scrape this page?

Here is the code. I know the problem happens when I try to pass the variable from the array to the subroutine. Do you know how I can pass the variable from the array into the subroutine with single quotes?

I was thinking it would look something like my @game = qr(@_); but I couldn't get that to work or find documentation that could answer my question.

#!/usr/bin/perl use WWW::Mechanize; #use strict; ### Create the Bot and set the Variables my $mech = WWW::Mechanize->new; my $url = 'http://www.vegasinsider.com/nfl/odds/las-vegas/line-movemen +t/bengals-@-ravens.cfm/date/9-07-08/time/1300#J'; save_file ($url); #### sub save_file { my $mech = WWW::Mechanize->new; $mech->timeout(60); my @game = @_; foreach (@game){ print "$_\n"; $_ =~ m{http://www.vegasinsider.com/(.*?)/odds/(.*?)/line- +movement/(.*?)-@-(.*?).cfm/date/(.*?)/time/}; print "$1 $2 $3 $4 $5\n"; my $filename = 'C:\Documents and Settings\Owner\Desktop\VI + Data\sub.html'; print "Getting $filename\n"; $mech->get( "$_", ":content_file" => $filename ) or die "C +an't get url"; print $mech->status; my $data = $mech->content; print " ", -s $filename, " bytes\n"; print $data; } } ## my $file = 'C:\Documents and Settings\Owner\Desktop\VI Data\ne +w.html'; $mech->timeout(60); $mech->get($url, ":content_file" => $file) or die "Can't get u +rl"; print $mech->status; my $data = $mech->content; #print " ", -s $filename, " bytes\n"; print $data;

Replies are listed 'Best First'.
Re^3: Anyone know why I can't scrape this page?
by Lawliet (Curate) on Sep 07, 2008 at 18:17 UTC

    At first glance, let me suggest that you uncomment use strict; and also make sure you use warnings; (You can also use warnings by placing -w at the end of your hashbang line: #!/usr/bin/perl -w).

    I know the problem happens when I try to pass the variable from the array to the subroutine.

    You are passing a scalar to the subroutine and then assigning that scalar to an array. You are not passing an array to the subroutine. I'll try to explain:

    my $url = 'http://www.vegasinsider.com/nfl/odds/.../1300#J'; # Assigni +ng that url to the scalar url. save_file ($url); # Calling the subroutine while passing the scalar $u +rl sub save_file { # Initiating sub my @game = @_; # Populating an array with all the contents of the + arguments that are passed to the subroutine. In this case, just one; + $url

    I assume you are going to pass multiple urls to the subroutine. Anyway, continuing on.

    Do you know how I can pass the variable from the array into the subroutine with single quotes?

    See my above explanation. I am confused as to what you mean here. Could you outline what you are trying to do?

    I'm so adjective, I verb nouns!

    chomp; # nom nom nom

      Basically the program is going to pull the line movement data from the website. I have another portion of the program which harvests the links off the site and puts them into an array. For example, if I'm looking at NFL odd in vegas, it will go to the nfl odds in vegas page, and pull all the links to the games there.

      From this array I want to use a subroutine (using WWW::Mechanize) to go to the links in the array and download the page to my computer.

      I think the problem I have stems from the fact that there is an @ sign in the url. When I test the script with a variable with single quotes, I'm able to get the page. When I try to use the subroutine I get an error. 400 bad request

        That is odd. Your code works fine when I simply ask to show the content.

        #!/usr/bin/perl -w use strict; use WWW::Mechanize; my $mech = WWW::Mechanize->new; my $url = 'http://www.vegasinsider.com/nfl/odds/las-vegas/line-movemen +t/bengals-@-ravens.cfm/date/9-07-08/time/1300#J'; save_file($url); sub save_file { my $mech = WWW::Mechanize->new; $mech->timeout(60); my @game = @_; foreach(@game) { print "$_\n"; $mech->get($url) or die "Can't get url"; print $mech->content; } }

        As long as you initially quote the url in single quotes it should not interpolate no matter how it is used. As long as I cannot reproduce the error I cannot fully help you to diagnose the problem.

        There is one thing that I feel obligated to bring to the table that I think you may have missed:

        NOTE: Because :content_file causes the page contents to be stored in a file instead of the response object, some Mech functions that expect it to be there won't work as expected. Use with caution.

        Unfortunately, that does not answer why your code works outside a subroutine.

        I'm so adjective, I verb nouns!

        chomp; # nom nom nom