SergioQ has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I've been making a scrapper using both LWP::UserAgent and WWW::Mechanize. So far it's mostly smooth sailing. But now I've come to an interesting speed-bump.

While following links, some of the are redirects to outside pages from the page I'm scarping.

i.e.
Am scrapping from Google
I follow a link from Google that is an inner direct to an outside site, let's say www.amazon.com.
What method can I use to find get find out that I've ended up on www.amazon.com from the link www.google.com/redirect/offsite/etc/etc/etc/?

Thank you.

  • Comment on How can I find where I've been redirected to, in my scrapper?

Replies are listed 'Best First'.
Re: How can I find where I've been redirected to, in my scrapper?
by jcb (Parson) on Jan 11, 2021 at 03:16 UTC

    As you have found, the response object can report the final URL that was retrieved. If you actually need the redirect responses themselves, the previous and redirects methods on the final HTTP::Response object will give you the intermediate responses.

    If you need to decide whether or not to follow the redirect, you can use the simple_request method instead of the request method on LWP::UserAgent.

Re: How can I find where I've been redirected to, in my scrapper?
by SergioQ (Scribe) on Jan 10, 2021 at 06:50 UTC

    Well, I added some code, and it worked, but need to make sure it's not blind link.

    my $requester = HTTP::Request->new(GET => $newgooleredirect); ##### + the link that takes me where I want to go, but via Google's redirect + system my $lwper = LWP::UserAgent->new(agent=>' Mozilla/5.0 (Windows NT 6 +.1; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0', cookie_jar=>{}, ti +meout => 10); my $resp = $lwper->request($requester); print $resp->base; ##### there this is from HTTP::Response

    The return was the correct link, www.amazon.com