I'm trying to scrape some information off this site. When I try to scrape the page I get an error. When I view the link in my browser I notice that it looks like the page is being redirected to another page. I checked the headers and couldn't figure out what is going on. Any suggestions?
#!/usr/bin/perl
use WWW::Mechanize;
my $mech = WWW::Mechanize->new;
my $url = "http://www.vegasinsider.com/nfl/odds/las-vegas/line-movemen
+t/jets-@-dolphins.cfm/date/9-07-08/";
$mech->get($url) or die "Can't get url";
my $data = $mech->content;
print $data;
Update - The single quotes works. That worked for a second and then I started getting a 500 error.
Update II - When I set the timeout to 60 I am more likely to get the page. Also I set the script in a subroutine and ran the same process outside the subroutine. It does not work when its in the sub routine which makes me think it has something to do with how I'm putting the variable into the subroutine.
Update III - I got it working with the subroutine. What you want to do is use URI for the urls.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.