Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Proper way to get redirected url from LWP::UserAgent response?

by cormanaz (Deacon)
on Jul 28, 2021 at 21:11 UTC ( [id://11135451]=perlquestion: print w/replies, xml ) Need Help??

cormanaz has asked for the wisdom of the Perl Monks concerning the following question:

Hi all. I am trying to expand some links found in twitter. They are all t.co links. I have been able to do it with those with a cheat:
use LWP::UserAgent; our $ua = LWP::UserAgent->new; $ua->max_redirect(10); my $twitter = ' https://t.co/NqACDPYdD9'; my $tiny = 'https://tinyurl.com/69rmnakp'; expand($twitter); sub expand { my ($short) = @_; my $response = $ua->get($short); if ($response->is_success) { my $long = my $long = my $long = $response->header('refres +h'); $long =~ s/0\;URL\=//; return $long; } else { say $response->status_line; } }
But there is no $response->{_headers}->{refresh}; element for the tinyurl link. In fact, browsing through $response with my debugger I can't find anything that looks like the expanded link. I know it's a valid link because my browser expands it.

What is the proper way to get the penultimate redirected-to link from the $response object?

Replies are listed 'Best First'.
Re: Proper way to get redirected url from LWP::UserAgent response?
by cormanaz (Deacon) on Jul 28, 2021 at 23:59 UTC
    I think I worked this out. I found in some WWW::Mechanize docs that the proper way to query the redirect is using $response->redirects(). It seems that t.co does not return this header (anyone know why?) so you have to get it with the refresh parameter. But all the other shorteners (at least a slew of them I tested) do. So this code should work for most purposes.
    use strict; use feature ':5.10'; use LWP::UserAgent; our $ua = LWP::UserAgent->new; $ua->max_redirect(10); my @links = qw( https://tinyurl.com/69rmnakp https://t.co/NqACDPYdD9 https://bit.ly/3eikm4z http://apne.ws/HGpbLTW ); foreach my $link (@links) { say "$link > ".expand($link); } sub expand { my ($short) = @_; my $long; my $response = $ua->get($short); if ($response->is_success) { my @redirects = $response->redirects(); if (@redirects) { $long = $redirects[0]->header('location'); } elsif ($response->header('refresh')) { $long = $response->header('refresh'); $long =~ s/0\;URL\=//; } return $long; } else { return $short; } }

      This is mostly because HTTP::Response / HTTP::Message only concern themselves with HTTP redirect responses (3xx codes). A redirect using the refresh header could be an interesting addition, but it is distinct as it is (usually) served with a 2xx code and not with a 3xx code.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11135451]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (6)
As of 2024-04-19 12:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found