Grabbing many webpages with foreach

monkeybus has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

Thanks for all your help so far.

Here be my latest problem.

Suppose I want to fetch any number of webpages from a site, the base URL is the same, http://www.hkjc.com/english/racing/horse.asp?HorseNo= but the final four digits are different.

I have a text file (A-list) that looks like this.

E246
E060
G108
[download]

and I want to run this code to fetch and strip the pages.

#! /usr/bin/perl

#open text file and fill array
open(MYINPUTFILE, "<A-list"); # open for input
my(@A_in) = <MYINPUTFILE>; # read file into list

#remove whitespace
@A_list = map { s/^\s+//; s/\s+$//; $_ } @A_in;

close(MYINPUTFILE);
 

#fetch each webpage
foreach (@A_list) {
chomp $A_list;

my $url = "http://www.hkjc.com/english/racing/horse.asp?HorseNo=$A_lis
+t";

   

  use LWP::Simple;
  my $content = get $url;
  die "Couldn't get $url" unless defined $content;


#strip away the HTML
use HTML::Strip;

  my $hs = HTML::Strip->new();

  my $clean_text = $hs->parse( $content );
  $hs->eof;

print $clean_text;
}
[download]

But it seems to be returning UNDEF. I can't seem to make it work.

I will crack this language if it kills me.

Any comments?

Comment on Grabbing many webpages with foreach Select or Download Code

Replies are listed 'Best First'.
Re: Grabbing many webpages with foreach by friedo (Prior) on Jun 14, 2007 at 02:46 UTC
You really need to use strict. If you had strict on, you'd see that `$A_list` was not declared anywhere. By default, a `foreach` loop puts each element into `$_` unless you give it another variable to use. You should also check the return status of your `open` call to make sure there are no errors. It also doesn't make much sense to put `use LWP::Simple;` and `use HTML::Strip` in the middle of your loop -- `use` statements are executed at compile time and modules are (usually) only loaded once. With that in mind, this should get you started. #!/usr/bin/perl use strict; use warnings; use LWP::Simple; #open text file and fill array open(MYINPUTFILE, "<A-list") or die "Can't open A-list: $!"; my(@A_in) = <MYINPUTFILE>; # read file into list #remove whitespace my @A_list = map { s/^\s+//; s/\s+$//; $_ } @A_in; close(MYINPUTFILE); #fetch each webpage foreach my $num(@A_list) { chomp $num; my $url = "http://www.hkjc.com/english/racing/horse.asp?HorseNo=$num"; my $content = get $url; die "Couldn't get $url" unless defined $content; print $content; } [download]	[reply] [d/l] [select]
Re^2: Grabbing many webpages with foreach by monkeybus (Acolyte) on Jun 14, 2007 at 03:17 UTC
Thank you very much indeed.	[reply]