in reply to Ending a loop of content of LWP's get-function

You want to loop over the URLs, with the fetch inside the loop.

my @urls = ( "http://localhost:8080/html.htm", ); for my $url (@urls) { my $html = get($url) or die "Couldn't fetch page."; $html =~ ... ... }

Or if you plan on adding to @urls,

my @urls = ( "http://localhost:8080/html.htm", ); while (@urls) { my $url = shift(@urls); my $html = get($url) or die "Couldn't fetch page."; $html =~ ... ... push @urls, $new_url; # or @new_urls ... }

Using push results in a breadth-first search.
Using unshift results in a width-first search instead.
The former is almost surely most desirable here.

Replies are listed 'Best First'.
Re^2: Ending a loop of content of LWP's get-function
by turbolofi (Acolyte) on Mar 27, 2009 at 17:32 UTC
    Thankyou for your quick reply, and for the pointers to push and unshift.
    I'm still struggling with getting it work correctly, though. I've tried both of your suggestions, with two different results:
    #!/usr/bin/perl -w use warnings; use strict; use LWP::Simple; my ($html, $url); my $count = 0; my @urls = ( "http://localhost:8080/html.htm", ); for my $url (@urls) { my $html = get($url) or die "Couldn't fetch page."; $html =~ m{<(a class=\"smallV110\" href=\"/)(.*?)\">} || die "couldn't + match"; #match regexp and capture backreference to $2, or die with e +rror $url = $2; print "$url\n"; $count++; print "$count\n"; }
    this gives only one line of content from the retrieved file. It loops till it has found one occurence of the matched pattern, then quits the loop. I'd like it to continue until the whole file has been matched. Is it possible to use "length" to achieve this?
    the other example gives a more grave error:
    #!/usr/bin/perl -w use warnings; use strict; use LWP::Simple; my ($html, $url); my $count = 0; my $new_url; my @urls = ( "http://localhost:8080/html.htm", ); while (@urls) { my $url = shift(@urls); my $html = get($url) or die "Couldn't fetch page."; $html =~ m{<(a class=\"smallV110\" href=\"/)(.*?)\">} || die "couldn't + match"; #match regexp and capture backreference to $2, or die with e +rror $url = $2; print "$url\n"; push @urls, $new_url; # or @new_urls }
    This code gives, as in the case above, one matched result from the retrieved file, then quits with the error:
    Use of uninitialized value $url in pattern match (m//) at C:/Perl/lib/LWP/Simple.pm line 131. Couldn't fetch page. at retrieve.pl line 13.
    I should note that I use ActivePerl, though I doubt very much that this is the cause of the latter problem. Again, I appreciate any help!
      $url = $2; <-- called $url here print "$url\n"; push @urls, $new_url; # or @new_urls <-- called $new_url here.

      Just rename one.

      Also, it seems you want to search for the pattern multiple times. You'll need the "g" modifier for that.

      while ($html =~ m{...}g) { my $new_url = $2; print "$new_url\n"; push @urls, $new_url; }