gnikol1 has asked for the wisdom of the Perl Monks concerning the following question:

Below I am trying to get the links from one page. I cannot, it return blank. The problem is not in authentication etc. Can you help me.
use LWP::UserAgent; use HTML::LinkExtor; use URI::URL; my @imgs = (); $url = "http://www.sn.no/"; # for instance $ua = new LWP::UserAgent; $ua->proxy(['http', 'ftp'] => 'http://proxy'); # Make the parser. Unfortunately, we don't know the base yet (it migh +t be diffent from $url) $p = HTML::LinkExtor->new(\&callback); # Request document and parse it as it arrives $res = HTTP::Request->new(GET => $url); $res->proxy_authorization_basic("user", "pass"); $res= $ua->request($res),sub {$p->parse($_[0])}; # Expand all image URLs to absolute ones my $base = $res->base; @imgs = map { $_ = url($_, $base)->abs; } @imgs; # Print them out print join("\n", @imgs), "\n"; # Set up a callback that collect image links sub callback { my($tag, %attr) = @_; return if $tag ne 'a href '; # we only look closer at <img ...> push(@imgs, values %attr); }

Added code tags 2002-02-21 dvergin

Replies are listed 'Best First'.
Re: Retrieving Links from a HTML Page
by boo_radley (Parson) on Feb 21, 2002 at 17:55 UTC
    sub callback { my($tag, %attr) = @_; return if $tag ne 'a href '; # we only look closer at <img ...> push(@imgs, values %attr); }
    This makes no sense whatsoever. You're including an attribute in a tag, and also tacking on some trailing spaces, and your comment indicates you want to look at images, but that doesn't jive with the comparison you're making.

    In fact, on closer review, this seems to be one of the examples from Link::Extor's POD. You might have more luck adapting the code from the synopsis, which prints out the links.
Re: Retrieving Links from a HTML Page
by gav^ (Curate) on Feb 21, 2002 at 17:49 UTC
    Firstly the page at http://www.sn.no/ doesn't contain any links (it is a frameset). Secondly you might want to check the response to see if everything went ok:
    unless ($res->is_success) { # handle error here! }

    gav^

      If the target page is a frameset then gnikol1 might want to see Re: Browser Emulation for a way to read the contents of the individual frames

      /J\