monkeybus has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

Thanks for all your help so far.

Here be my latest problem.

Suppose I want to fetch any number of webpages from a site, the base URL is the same,   http://www.hkjc.com/english/racing/horse.asp?HorseNo= but the final four digits are different.

I have a text file (A-list) that looks like this.

E246 E060 G108

and I want to run this code to fetch and strip the pages.

#! /usr/bin/perl #open text file and fill array open(MYINPUTFILE, "<A-list"); # open for input my(@A_in) = <MYINPUTFILE>; # read file into list #remove whitespace @A_list = map { s/^\s+//; s/\s+$//; $_ } @A_in; close(MYINPUTFILE); #fetch each webpage foreach (@A_list) { chomp $A_list; my $url = "http://www.hkjc.com/english/racing/horse.asp?HorseNo=$A_lis +t"; use LWP::Simple; my $content = get $url; die "Couldn't get $url" unless defined $content; #strip away the HTML use HTML::Strip; my $hs = HTML::Strip->new(); my $clean_text = $hs->parse( $content ); $hs->eof; print $clean_text; }

But it seems to be returning UNDEF. I can't seem to make it work.

I will crack this language if it kills me.

Any comments?

Replies are listed 'Best First'.
Re: Grabbing many webpages with foreach
by friedo (Prior) on Jun 14, 2007 at 02:46 UTC

    You really need to use strict. If you had strict on, you'd see that $A_list was not declared anywhere. By default, a foreach loop puts each element into $_ unless you give it another variable to use. You should also check the return status of your open call to make sure there are no errors.

    It also doesn't make much sense to put use LWP::Simple; and use HTML::Strip in the middle of your loop -- use statements are executed at compile time and modules are (usually) only loaded once.

    With that in mind, this should get you started.

    #!/usr/bin/perl use strict; use warnings; use LWP::Simple; #open text file and fill array open(MYINPUTFILE, "<A-list") or die "Can't open A-list: $!"; my(@A_in) = <MYINPUTFILE>; # read file into list #remove whitespace my @A_list = map { s/^\s+//; s/\s+$//; $_ } @A_in; close(MYINPUTFILE); #fetch each webpage foreach my $num(@A_list) { chomp $num; my $url = "http://www.hkjc.com/english/racing/horse.asp?HorseNo=$num"; my $content = get $url; die "Couldn't get $url" unless defined $content; print $content; }
      Thank you very much indeed.