in reply to Converting relative URLs to absolute URLs

Here's a more formal example from the archives:
#!/usr/bin/perl use strict; use URI::URL; use constant BASE => 'http://www.pair.com/pair/support/index.html'; print "BASE is ", BASE, "\n\n"; while ( chomp(my $path = <DATA>) ) { &tryit( $path ); } sub tryit { my $relative = shift; my $path = URI::URL->new($relative)->abs( BASE, 1 ); print "$relative ->\n\t$path\n\n"; } __DATA__ http://www.pair.com /index.html https://www.pairnic.com/faq.m search/ library.html

I'm not really a human, but I play one on earth. flash japh

Replies are listed 'Best First'.
Re: Re: Converting relative URLs to absolute URLs
by Anonymous Monk on Mar 05, 2004 at 03:38 UTC
    How do I use URI::URL to find out whether the link is relative or absolute? I am feeding it parsed links from simplelinkextor, $_ is the url of the page the links where taken from, and it's done in a loop. I understand the print url("foo/test.html")->abs("http://www.sun.com/");' part but how do I know that foo doesn't have a slash in front of it or that sun.com has a slash at the end? I can't find URI documentation, and there are no examples at CPAN. Thanks
      it is no problem, see
      use URI::URL; print url("foo/test.html")->abs("http://www.sun.com/"),"\n"; print url("/foo/test.html")->abs("http://www.sun.com/"),"\n"; print url("foo/test.html")->abs("http://www.sun.com"),"\n"; print url("http://www.sun.com/foo/test.html")->abs("http://www.sun.com +/"),"\n";
        But I don't know what form the urls will be in. Heres a flowchart:

        open FILE1
        open FILE2
        while FILE1 is open take the urls out of them and use LWP::robotua to get their content url is in $_.
        Parse the content for links using HTML::SimpleLinkExtor.
        Take the links from linkextor and run them through uri to make them all absolute and then print them to FILE2.
        When FILE1 is done, goto FILE2 and do the same thing.

        How would the URI portion of this work, I can't use esskar's method because I don't know what the base or the link is, the program will have to figure that out.

        Thanks