Printing From Several Webpages

MiriamH has asked for the wisdom of the Perl Monks concerning the following question:

I have a list of 500 websites that I acquired using WWW::Mechanize. I have a 'cleaning' code that can take any individual website and remove all the coding to just leave me with the website data. I need a way to make my script automate 'cleaning' the web pages and then parsing the data. This is what I have so far, but it doesn't work.

#Download all the modules I used#
use LWP::Simple;
use HTML::TreeBuilder;
use HTML::FormatText;
use WWW::Mechanize;

#Download original webpage and acquire 500+ Links#
$url = "http://wx.toronto.ca/festevents.nsf/all?openform";

my $mechanize = WWW::Mechanize->new(autocheck => 1);

$mechanize->get($url);


my $title = $mechanize->title;

print "<b>$title</b><br />";


my @links = $mechanize->links;


## THIS IS WHERE MY PROBLEM STARTS: I dont know how to use foreach loo
+ps.  I thought if I put the "$link" variable as the "get ()" each tim
+e it would go through the loop it would "get" a different webpage. Ho
+wever it does not work even though no error shows## 

foreach my $link (@links) {

   # Retrieve the link URL
   my $href = $link->url;

 

  $URL1= get("$link");

$Format=HTML::FormatText->new;
$TreeBuilder=HTML::TreeBuilder->new;
$TreeBuilder->parse($URL1);
$Parsed=$Format->format($TreeBuilder);

open(FILE, ">TorontoParties.txt");
print FILE "$Parsed";
close (FILE);

 }
[download]

Comment on Printing From Several Webpages Download Code

Replies are listed 'Best First'.
Re: Printing From Several Webpages by toolic (Bishop) on Jul 06, 2012 at 18:26 UTC
`open(FILE, ">TorontoParties.txt"); print FILE "$Parsed";` [download] Every time through the foreach loop, you open and write to the same file. Perhaps you want to create a file of a different name each time thru the loop. Or maybe you want to append to the same file every time (open).	[reply] [d/l]
Re: Printing From Several Webpages by Corion (Patriarch) on Jul 06, 2012 at 18:27 UTC
See the replies you got to your problem in Building a Spidering Application. Maybe you want to reduce your problem by eliminating WWW::Mechanize from the picture?	[reply]
Re: Printing From Several Webpages by ig (Vicar) on Jul 06, 2012 at 20:06 UTC
Your description of your problem is a bit vague. You say you don't know how to use `foreach` loops, but I don't see anything wrong with how you used `foreach` in what you posted. However it does not work even though no error shows No error shows because you are not checking for and reporting errors. For example, the synopsis of LWP::Simple has this example: `use LWP::Simple; $content = get("http://www.sn.no/"); die "Couldn't get it!" unless defined $content;` [download] Read more... (14 kB)	[reply] [d/l] [select]