cdherold has asked for the wisdom of the Perl Monks concerning the following question:

Hi again Monks,

I am trying to get the links within a redirected url if the url I am using in my code does actually redirect (sometimes it does and and sometimes it doesn't). How do I have perl follow that redirect (if it exists) before it starts trying to extract links? I currently am using LWP::UserAgent and the following code (which right now doesn't follow redirects) ...

$url = "http://this.url.may.redirect.to.url.with.content"; $ua = new LWP::UserAgent; # Set up a callback that collects links my @links = (); sub callback { my($tag, %attr) = @_; return if $tag ne 'a'; push(@links, values %attr); } # Make the parser. $p = HTML::LinkExtor->new(\&callback); # Request document and parse it as it arrives $res = $ua->request(HTTP::Request->new(GET => $url), sub {$p->parse($_[0])}); #Expand all URLs to absolute ones my $base = $res->base; @links = map { $_ = url($_, $base)->abs; } @links; print "<b>Original Links:</b> <p>@links<p>";

Thanks Monks!

Chris

Replies are listed 'Best First'.
Re: Follow Redirect Before Accessing Web Content
by Corion (Patriarch) on Sep 08, 2007 at 06:51 UTC

    I recommend WWW::Mechanize - it transparently follows redirects and behaves more like a browser. If you want to do it by hand, you have to check the return code you get in $res and check manually whether it's an error (you should do that!), a result (2xx) or a redirect (3xx).