MiriamH has asked for the wisdom of the Perl Monks concerning the following question:
I have a list of 500 websites that I acquired using WWW::Mechanize. I have a 'cleaning' code that can take any individual website and remove all the coding to just leave me with the website data. I need a way to make my script automate 'cleaning' the web pages and then parsing the data. This is what I have so far, but it doesn't work.
#Download all the modules I used# use LWP::Simple; use HTML::TreeBuilder; use HTML::FormatText; use WWW::Mechanize; #Download original webpage and acquire 500+ Links# $url = "http://wx.toronto.ca/festevents.nsf/all?openform"; my $mechanize = WWW::Mechanize->new(autocheck => 1); $mechanize->get($url); my $title = $mechanize->title; print "<b>$title</b><br />"; my @links = $mechanize->links; ## THIS IS WHERE MY PROBLEM STARTS: I dont know how to use foreach loo +ps. I thought if I put the "$link" variable as the "get ()" each tim +e it would go through the loop it would "get" a different webpage. Ho +wever it does not work even though no error shows## foreach my $link (@links) { # Retrieve the link URL my $href = $link->url; $URL1= get("$link"); $Format=HTML::FormatText->new; $TreeBuilder=HTML::TreeBuilder->new; $TreeBuilder->parse($URL1); $Parsed=$Format->format($TreeBuilder); open(FILE, ">TorontoParties.txt"); print FILE "$Parsed"; close (FILE); }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Printing From Several Webpages
by toolic (Bishop) on Jul 06, 2012 at 18:26 UTC | |
|
Re: Printing From Several Webpages
by Corion (Patriarch) on Jul 06, 2012 at 18:27 UTC | |
|
Re: Printing From Several Webpages
by ig (Vicar) on Jul 06, 2012 at 20:06 UTC |