gnikol1 has asked for the wisdom of the Perl Monks concerning the following question:
use LWP::UserAgent; use HTML::LinkExtor; use URI::URL; my @imgs = (); $url = "http://www.sn.no/"; # for instance $ua = new LWP::UserAgent; $ua->proxy(['http', 'ftp'] => 'http://proxy'); # Make the parser. Unfortunately, we don't know the base yet (it migh +t be diffent from $url) $p = HTML::LinkExtor->new(\&callback); # Request document and parse it as it arrives $res = HTTP::Request->new(GET => $url); $res->proxy_authorization_basic("user", "pass"); $res= $ua->request($res),sub {$p->parse($_[0])}; # Expand all image URLs to absolute ones my $base = $res->base; @imgs = map { $_ = url($_, $base)->abs; } @imgs; # Print them out print join("\n", @imgs), "\n"; # Set up a callback that collect image links sub callback { my($tag, %attr) = @_; return if $tag ne 'a href '; # we only look closer at <img ...> push(@imgs, values %attr); }
Added code tags 2002-02-21 dvergin
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Retrieving Links from a HTML Page
by boo_radley (Parson) on Feb 21, 2002 at 17:55 UTC | |
|
Re: Retrieving Links from a HTML Page
by gav^ (Curate) on Feb 21, 2002 at 17:49 UTC | |
by gellyfish (Monsignor) on Feb 22, 2002 at 11:06 UTC |