Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

LWP can successfully follow the redirect of webpages, but how can I get the URI of the page that LWP is reading after the redirect? Reason: I need to scrape that page and reconstruct the relative paths of the links contained in that page.

I can obtain something with:

my $ua = LWP::UserAgent->new; my $ret = $ua->get($site); my $url = $ret->request->uri .""; print "URL returned: ".$url."\n";

However, this script is NOT able to give me the target URI if the redirection is caused by a line of code in the HTML (vs. with a redirection from the server side), such as <meta http-equiv="refresh" content="0; url=site/index.html" />

Replies are listed 'Best First'.
Re: get URI of redirected page with LWP
by choroba (Cardinal) on Feb 20, 2022 at 17:58 UTC
    Searching the documentation of LWP::UserAgent for "redirect" returns several relevant hits.

    For example:

    simple_request
    my $request = HTTP::Request->new( ... ); my $res = $ua->simple_request( $request ); my $res = $ua->simple_request( $request, $content_file ); my $res = $ua->simple_request( $request, $content_cb ); my $res = $ua->simple_request( $request, $content_cb, $read_size_hint +);
    This method dispatches a single request and returns the response received. Arguments are the same as for the request in LWP::UserAgent described above.

    The difference from request in LWP::UserAgent is that simple_request will not try to handle redirects or authentication responses. The request in LWP::UserAgent method will, in fact, invoke this method for each simple request it sends.

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re: get URI of redirected page with LWP
by kschwab (Vicar) on Feb 21, 2022 at 14:03 UTC

    LWP does not support redirects from meta html tags, it only supports redirects that come as HTTP responses, like a 301 or 302.

    You could parse the html yourself, or use something else that supports meta tags.

Re: get URI of redirected page with LWP
by Anonymous Monk on Feb 21, 2022 at 10:08 UTC