I have a script that will download a webpage and its links, but I want to application to 'spider' and then open the links. (The original web page is a list of links to webpages that are useful to me)
Here is my code with comments
# Include the WWW::Mechanize module use WWW::Mechanize; # What URL shall we retrieve? $url = "http://wx.toronto.ca/festevents.nsf/all?openform"; # Create a new instance of WWW::Mechanize # enabling autoheck checks each request to ensure it was successful, # producing an error if not. my $mechanize = WWW::Mechanize->new(autocheck => 1); # Retrieve the page $mechanize->get($url); # Assign the page content to $page my $page = $mechanize->content; # Retrieve the page title my $title = $mechanize->title; print "<b>$title</b><br />"; # Place all of the links in an array my @links = $mechanize->links; # Loop through and output each link foreach my $link (@links) { # Retrieve the link URL my $href = $link->url; # Retrieve the link text my $name = $link->text; print "<a href=\"$href\">$name</a>\n"; }
At the end I want to loop back and download the links in order to parse useful information. Please help! --Miriam
In reply to Building a Spidering Application by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |