kazak has asked for the wisdom of the Perl Monks concerning the following question:

Hi 2 all.

How can I redirect some HTTP client with LWPUserAgent ?

For example client tries to get example.com if response is "302 example.com/good" change nothing, but if response is "302 example.com/bad", cutoff "/bad" from the URI and redirect client on "example.com". Thanks in advance.

Replies are listed 'Best First'.
Re: Redirection with LWPUserAgent
by ikegami (Patriarch) on Dec 18, 2011 at 11:09 UTC
    Your question makes no sense. LWP::UserAgent is an HTTP client, it doesn't talk to other HTTP clients, and it does follow redirections.

      Sorry, I'm newbie in Perl.

      So I think I should explain following: There is a perl module HTTP::Proxy, according to its author it uses LWPUserAgent among others, so may be I'm really said nonsense, my apologize for this. But question remains: Is this possible using LWP/HTTP::Proxy/or something else?

      Thanks in advance.

        kazak:

        It looks like HTTP::Proxy will be able to help you with that.

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.

        Is this possible using LWP/HTTP::Proxy/or something else?

        Is it possible to do what? Have your client obey redirects? It should be, but I guess that depends on your client, and you haven't specified what it is.

        What part of the prior answer is so hard to understand? Would "NO! be clearer?

        If you have some reason to regard that answer as less than definitive, please explain some more.

Re: Redirection with LWPUserAgent
by CountZero (Bishop) on Dec 18, 2011 at 13:54 UTC
    HTTP status message 302 is "Found - The requested page has moved temporarily to a new url".

    Why would you want to change that in some cases? And how would you know which redirections are bad and which are good?

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      At first of all thanks for your answer.

      And about my question: I'm trying to implement a SEO tool, some client scripts will use this "proxy" for searching something through the search engines.

      Something like:

      request->search engine->search result->destination page.

      If destination page returns: 404, or something like this such request must not considered as successful( In such case we often can get redirection on: example.com/sorry/... example.com/Not Found, etc). If that happened request should be cleaned out ( in a way mentioned above) and retrieved through another search engine ( atleast script must to try to retrieve it).
        Yes, I think HTTP::Proxy can do that. Your users connect to the server side of HTTP::Proxy and the client side then receives and filters the reply and the server side issues a redirection to another search engine if the reply is 404 or 410. All other status messages (including the redirection messages) should be passed along "as is" in my opinion.

        However there is a hidden snag in your idea: the HTTP protocol is stateless, so when the user clicks on one of the search engine's results and gets a "bad" result, you want him to redirect to another search engine with the original search request, but by that time the server (HTTP::Proxy) has all forgotten about that original search request. You will have to implement some form of session management or perhaps use a short living cookie to maintain state.

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        Your question has morphed into something else. You have a lot of reading to do, starting with LWP::UserAgent.

        Right there in the synopsis, it tells you:

        my $response = $ua->get('http://search.cpan.org/'); if ($response->is_success) { print $response->decoded_content; # or whatever } else { die $response->status_line; }
        HTTP status codes tells you that 4xx codes mean "it didn't work". 404 is "not found". And $response->is_success will be false. So it is easy to tell if a URL actually lead to somewhere.

        The LWP will follow the 3xx redirection status codes up to the default of 7 links (the LWP page above tells you that too). This behavior can be changed but it sounds like you don't want to do that.

        I am no expert on LWP, but I've written a few of these things - the complications and weird things that can happen are legion. I will venture to say that your chances of success are approximately 0% if you don't write some experimental code on your own to "play around" before you tackle your overall project given that it sounds pretty ambitious. I would write a basic framework and get that working before tackling the boundary cases and tricky handling of response codes!

        When you encounter a problem (and you will), try a lot of experiments and then post some code, preferably something simple (subset of your actual code and use some URL where we can run your code to replicate your results).