lampros21_7 has asked for the wisdom of the Perl Monks concerning the following question:

Hi to the fellow monks

I have made a test program to try and make URI::URL work to make some relative url's turn into absolute ones. Obviously, am doing something wrong because the output should be a list of URL's but instead am getting the relative url's instead of the absolute ones. The code is below:

use URI::URL; use WWW::Mechanize; my $url = 'http://www.dcs.shef.ac.uk/'; my $url1 = URI::URL->new($url); my $webcrawler = WWW::Mechanize->new(); my $content = $webcrawler->get($url); my @links = map { $_->[0] } $webcrawler->links; my $base = $url1->base(); my @absolute_links = map { $_= url($_, $base)->abs; } @links; print "@linkarray \n";

I have tried printing both the (linkarray) and (links) arrays but i get the same result which is get the url's as they are in the html document(the relative links). Any idea what i should change? Maybe the map function for the absolute_links array is nt written right but am not too sure how to change it.Help is greatly appreciated!

Replies are listed 'Best First'.
Re: Using URI::URL to go through an array of relative URL's
by fishbot_v2 (Chaplain) on Sep 09, 2005 at 13:36 UTC

    Try printing the value of $base, you will find it is empty. URI::URL->base() returns the previously set base for an relative URL. You already have the 'base' for converting to absolute: $url.

    use strict; use warnings; use URI::URL; use WWW::Mechanize; use Data::Dumper; my $url = 'http://www.dcs.shef.ac.uk/'; my $webcrawler = WWW::Mechanize->new(); my $content = $webcrawler->get($url) || die( "!!etc" ); my @absolute_links = map { url( $_->[0], $url )->abs->as_string } @{ $webcrawler->links }; print Dumper \@absolute_links;

    In general, to debug an issue like this, you just need to step through printing your intermediate values. Data::Dumper is your friend.

    Addendum: The way you had written your map was destructive - it changed the old array as you created the new one. In general, try to think of your map block as an expression rather than a statement.

      The problem is that  $url will not always be my base url. For example if i take one of the url's i get with the WWW::Mechanize->links() method if i try to use those url's later to retrieve the links from their pages then i might have a few that their base url will not be the one in  $url. Hope that made sense.

      Basically, i want my script to find the base url(because am not sure what it will be every time) and then use it on an array of my found links to make all the url's absolute. Hope someone can help. Thanks

        In that case, use the base() method from WWW::Mechanize:

        my $webcrawler = WWW::Mechanize->new(); my $content = $webcrawler->get($url) || die( "!!etc" ); my $base = $webcrawler->base();

        The base url for a page isn't available until you actually look at the page. You can't divine that information from the URI object.