Georgio has asked for the wisdom of the Perl Monks concerning the following question:

Hi!
I'm trying to read and display some HTML - pages from my script, like that:
#!/usr/bin/perl use CGI; use CGI::Carp qw(fatalsToBrowser); use LWP::UserAgent; $cgi = new CGI; # Create a user agent object $ua = new LWP::UserAgent; $ua->agent("AgentName/0.1 " . $ua->agent); # Create a request $req = new HTTP::Request GET => 'http://www.profit-position.com'; $req->content_type('application/x-www-form-urlencoded'); $req->content('match=www&errors=0'); # Pass request to the user agent and get a response back $res = $ua->request($req); # Check the outcome of the response print $cgi->header(); if ($res->is_success) { print $res->content; } else { print "Bad luck this time\n"; }
, but if in the HTML code there are short links (i.e. not full paths!), they aren't displayed.
Are there some modules to resolve such a problem?
Thanks!

Replies are listed 'Best First'.
Re: Read and Display HTML - pages with CGI-script.
by simonm (Vicar) on Sep 20, 2003 at 20:11 UTC
    There are modules which would allow you to go through and rewrite all of the URLs, but as a much easier fix, try adding an HTML base href tag to the returned document:
    my $content = $res->content; $content =~ s/(</head)/<base href="http://www.profit-position.com">$ +1/i; print $content;

      ... of course, a base tag should only be added if no base tag is already present :-))

      perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web
Re: Read and Display HTML - pages with CGI-script.
by chromatic (Archbishop) on Sep 20, 2003 at 20:04 UTC

    I know of no serious module that could resolve a relative URL into a full URL. There are millions of web sites; how would you choose which one to use? Would index.pl?node=100 apply to Perl Monks, Everything 2, or Anime-Fu?

    Once you answer that question, it's not difficult to write a bit of code to absolutify relative URIs. I suggest the URI module. If there's no scheme and hostname, add the default.

      For what it's worth, the URL in this case seems to be hard-coded in the HTTP get line -- the OP's script is just proxying for this one page.