Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Oh monks, I've imported a web page using the get() method from LWP::Simple. Now, I want to replace all relative URLs with absolute URLs (primarily so that image links don't break). I don't want to use lwp-rget, because I don't want to store all the images locally. Can someone help me? humble thanks.
  • Comment on replace relative links with absolute links

Replies are listed 'Best First'.
•Re: replace relative links with absolute links
by merlyn (Sage) on Jan 06, 2003 at 18:46 UTC
      I liked how you phrased these standard responses before. I just had to laugh when I read how you are writing them now: "Maybe this ... would be a good start".

      Come on. You are wimping out! People who complain about you have absolutely no life. Don't take them seriously. Most of them are probably still living at home. I like how you put things before. I liked your style. Don't change!

      s/newrandal/oldrandal/g;

        I'm not wimping out at all. Sometimes, I feel something I've written in a column is spot-on advice for the question at hand. I didn't feel that was the case on this one... some of the code is reusable, while other parts of it are completely distracting. And the parts that were applicable weren't some of my better-written stuff. {grin}

        So, I couldn't say "I have a column on that". I had to say "Some parts of this column are applicable, perhaps."

        No wimp. Just truth.

        -- Randal L. Schwartz, Perl hacker
        Be sure to read my standard disclaimer if this is a reply.

Re: replace relative links with absolute links
by Jenda (Abbot) on Jan 06, 2003 at 19:46 UTC

    That's unnecessarily complex. Just stick a

    <base href="$the_url_you_fetched">
    into the <HEAD> and you are done.

    Jenda

      Thank you! with some code to handle unusual cases, that worked perfectly.
Re: replace relative links with absolute links
by meetraz (Hermit) on Jan 06, 2003 at 19:19 UTC
    I would suggest using HTML::Parser to extract the relative URLs, and then use the URI module to change them into absolute (see the abs() method)
      Good idea. can also use the

      HTML::LinkExtor
      URI::URL

      modules. simple example:

      #!/usr/bin/perl -w use strict; my @imgs; my $lx = HTML::LinkExtor->new(\&img); my $respond = LWP::UserAgent->new->request(HTTP::Request->new(GET => ' +http://www.ibm.com')); my $base = $respond->base; $lx->parse($respond->content); for(@imgs){ print "Relative: $_\n"; print "Full: ",url($_,$base)->abs,"\n\n"; } sub img{ my($tag,%links) = @_; push(@imgs,values %links) if($tag eq 'img'); }
        HTML::LinkExtor is for extracting links, not altering HTML content. It's definetly the wrong tool for this job. HTML::Parser or HTML::TokeParser::Simple is what you'd wanna use here.

        update: here are some relevant links


        MJD says you can't just make shit up and expect the computer to know what you mean, retardo!
        ** The Third rule of perl club is a statement of fact: pod is sexy.