Follow Redirect Before Accessing Web Content

cdherold has asked for the wisdom of the Perl Monks concerning the following question:

Hi again Monks,

I am trying to get the links within a redirected url if the url I am using in my code does actually redirect (sometimes it does and and sometimes it doesn't). How do I have perl follow that redirect (if it exists) before it starts trying to extract links? I currently am using LWP::UserAgent and the following code (which right now doesn't follow redirects) ...

$url = "http://this.url.may.redirect.to.url.with.content";

$ua = new LWP::UserAgent;
  
# Set up a callback that collects links

my @links = ();

sub callback {
              my($tag, %attr) = @_;
              return if $tag ne 'a';
              push(@links, values %attr);
}

# Make the parser.  
$p = HTML::LinkExtor->new(\&callback);

# Request document and parse it as it arrives
$res = $ua->request(HTTP::Request->new(GET => $url),
                      sub {$p->parse($_[0])});
                      
#Expand all URLs to absolute ones
my $base = $res->base;
@links = map { $_ = url($_, $base)->abs; } @links;

print "<b>Original Links:</b> <p>@links<p>";
[download]

Thanks Monks!

Chris

Comment on Follow Redirect Before Accessing Web Content Download Code

Replies are listed 'Best First'.
Re: Follow Redirect Before Accessing Web Content by Corion (Patriarch) on Sep 08, 2007 at 06:51 UTC
I recommend WWW::Mechanize - it transparently follows redirects and behaves more like a browser. If you want to do it by hand, you have to check the return code you get in `$res` and check manually whether it's an error (you should do that!), a result (2xx) or a redirect (3xx).	[reply] [d/l]