Redirection with LWPUserAgent

kazak has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Redirection with LWPUserAgent by ikegami (Patriarch) on Dec 18, 2011 at 11:09 UTC
Your question makes no sense. LWP::UserAgent is an HTTP client, it doesn't talk to other HTTP clients, and it does follow redirections.	[reply]
Re^2: Redirection with LWPUserAgent by kazak (Beadle) on Dec 18, 2011 at 12:19 UTC
Sorry, I'm newbie in Perl. So I think I should explain following: There is a perl module HTTP::Proxy, according to its author it uses LWPUserAgent among others, so may be I'm really said nonsense, my apologize for this. But question remains: Is this possible using LWP/HTTP::Proxy/or something else? Thanks in advance.	[reply]
Re^3: Redirection with LWPUserAgent by roboticus (Chancellor) on Dec 18, 2011 at 12:53 UTC
kazak: It looks like HTTP::Proxy will be able to help you with that. ...roboticus When your only tool is a hammer, all problems look like your thumb.	[reply]
Re^3: Redirection with LWPUserAgent by ikegami (Patriarch) on Dec 19, 2011 at 05:44 UTC
Is this possible using LWP/HTTP::Proxy/or something else? Is it possible to do what? Have your client obey redirects? It should be, but I guess that depends on your client, and you haven't specified what it is.	[reply]
Re^3: Redirection with LWPUserAgent by ww (Archbishop) on Dec 18, 2011 at 12:48 UTC
What part of the prior answer is so hard to understand? Would "NO! be clearer? If you have some reason to regard that answer as less than definitive, please explain some more.	[reply]
Re: Redirection with LWPUserAgent by CountZero (Bishop) on Dec 18, 2011 at 13:54 UTC
HTTP status message 302 is "Found - The requested page has moved temporarily to a new url". Why would you want to change that in some cases? And how would you know which redirections are bad and which are good? CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James	[reply]
Re^2: Redirection with LWPUserAgent by kazak (Beadle) on Dec 18, 2011 at 15:50 UTC
At first of all thanks for your answer. And about my question: I'm trying to implement a SEO tool, some client scripts will use this "proxy" for searching something through the search engines. Something like: request->search engine->search result->destination page. If destination page returns: 404, or something like this such request must not considered as successful( In such case we often can get redirection on: example.com/sorry/... example.com/Not Found, etc). If that happened request should be cleaned out ( in a way mentioned above) and retrieved through another search engine ( atleast script must to try to retrieve it).	[reply]
Re^3: Redirection with LWPUserAgent by CountZero (Bishop) on Dec 19, 2011 at 10:14 UTC
Yes, I think HTTP::Proxy can do that. Your users connect to the server side of HTTP::Proxy and the client side then receives and filters the reply and the server side issues a redirection to another search engine if the reply is 404 or 410. All other status messages (including the redirection messages) should be passed along "as is" in my opinion. However there is a hidden snag in your idea: the HTTP protocol is stateless, so when the user clicks on one of the search engine's results and gets a "bad" result, you want him to redirect to another search engine with the original search request, but by that time the server (HTTP::Proxy) has all forgotten about that original search request. You will have to implement some form of session management or perhaps use a short living cookie to maintain state. CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James	[reply]
Re^4: Redirection with LWPUserAgent by kazak (Beadle) on Dec 21, 2011 at 14:40 UTC
Re^5: Redirection with LWPUserAgent by CountZero (Bishop) on Dec 22, 2011 at 14:25 UTC
Some notes below your chosen depth have not been shown here
Re^3: Redirection with LWPUserAgent by Marshall (Canon) on Dec 19, 2011 at 09:52 UTC
Your question has morphed into something else. You have a lot of reading to do, starting with LWP::UserAgent. Right there in the synopsis, it tells you: `my $response = $ua->get('http://search.cpan.org/'); if ($response->is_success) { print $response->decoded_content; # or whatever } else { die $response->status_line; }` [download] HTTP status codes tells you that 4xx codes mean "it didn't work". 404 is "not found". And $response->is_success will be false. So it is easy to tell if a URL actually lead to somewhere. The LWP will follow the 3xx redirection status codes up to the default of 7 links (the LWP page above tells you that too). This behavior can be changed but it sounds like you don't want to do that. I am no expert on LWP, but I've written a few of these things - the complications and weird things that can happen are legion. I will venture to say that your chances of success are approximately 0% if you don't write some experimental code on your own to "play around" before you tackle your overall project given that it sounds pretty ambitious. I would write a basic framework and get that working before tackling the boundary cases and tricky handling of response codes! When you encounter a problem (and you will), try a lot of experiments and then post some code, preferably something simple (subset of your actual code and use some URL where we can run your code to replicate your results).	[reply] [d/l]