in reply to Redirection with LWPUserAgent

HTTP status message 302 is "Found - The requested page has moved temporarily to a new url".

Why would you want to change that in some cases? And how would you know which redirections are bad and which are good?

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Replies are listed 'Best First'.
Re^2: Redirection with LWPUserAgent
by kazak (Beadle) on Dec 18, 2011 at 15:50 UTC

    At first of all thanks for your answer.

    And about my question: I'm trying to implement a SEO tool, some client scripts will use this "proxy" for searching something through the search engines.

    Something like:

    request->search engine->search result->destination page.

    If destination page returns: 404, or something like this such request must not considered as successful( In such case we often can get redirection on: example.com/sorry/... example.com/Not Found, etc). If that happened request should be cleaned out ( in a way mentioned above) and retrieved through another search engine ( atleast script must to try to retrieve it).
      Yes, I think HTTP::Proxy can do that. Your users connect to the server side of HTTP::Proxy and the client side then receives and filters the reply and the server side issues a redirection to another search engine if the reply is 404 or 410. All other status messages (including the redirection messages) should be passed along "as is" in my opinion.

      However there is a hidden snag in your idea: the HTTP protocol is stateless, so when the user clicks on one of the search engine's results and gets a "bad" result, you want him to redirect to another search engine with the original search request, but by that time the server (HTTP::Proxy) has all forgotten about that original search request. You will have to implement some form of session management or perhaps use a short living cookie to maintain state.

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        Or maybe we can choose another way,maybe we can implement it with a redirection. I mean when answer will be "302 example.com/bad_result/..." just change a "Location" header and return it to the client, in such case we can avoid of additional headache with sessions\cookies stuff, how do you think ?
      Your question has morphed into something else. You have a lot of reading to do, starting with LWP::UserAgent.

      Right there in the synopsis, it tells you:

      my $response = $ua->get('http://search.cpan.org/'); if ($response->is_success) { print $response->decoded_content; # or whatever } else { die $response->status_line; }
      HTTP status codes tells you that 4xx codes mean "it didn't work". 404 is "not found". And $response->is_success will be false. So it is easy to tell if a URL actually lead to somewhere.

      The LWP will follow the 3xx redirection status codes up to the default of 7 links (the LWP page above tells you that too). This behavior can be changed but it sounds like you don't want to do that.

      I am no expert on LWP, but I've written a few of these things - the complications and weird things that can happen are legion. I will venture to say that your chances of success are approximately 0% if you don't write some experimental code on your own to "play around" before you tackle your overall project given that it sounds pretty ambitious. I would write a basic framework and get that working before tackling the boundary cases and tricky handling of response codes!

      When you encounter a problem (and you will), try a lot of experiments and then post some code, preferably something simple (subset of your actual code and use some URL where we can run your code to replicate your results).