in reply to Re: Redirection with LWPUserAgent
in thread Redirection with LWPUserAgent

At first of all thanks for your answer.

And about my question: I'm trying to implement a SEO tool, some client scripts will use this "proxy" for searching something through the search engines.

Something like:

request->search engine->search result->destination page.

If destination page returns: 404, or something like this such request must not considered as successful( In such case we often can get redirection on: example.com/sorry/... example.com/Not Found, etc). If that happened request should be cleaned out ( in a way mentioned above) and retrieved through another search engine ( atleast script must to try to retrieve it).

Replies are listed 'Best First'.
Re^3: Redirection with LWPUserAgent
by CountZero (Bishop) on Dec 19, 2011 at 10:14 UTC
    Yes, I think HTTP::Proxy can do that. Your users connect to the server side of HTTP::Proxy and the client side then receives and filters the reply and the server side issues a redirection to another search engine if the reply is 404 or 410. All other status messages (including the redirection messages) should be passed along "as is" in my opinion.

    However there is a hidden snag in your idea: the HTTP protocol is stateless, so when the user clicks on one of the search engine's results and gets a "bad" result, you want him to redirect to another search engine with the original search request, but by that time the server (HTTP::Proxy) has all forgotten about that original search request. You will have to implement some form of session management or perhaps use a short living cookie to maintain state.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      Or maybe we can choose another way,maybe we can implement it with a redirection. I mean when answer will be "302 example.com/bad_result/..." just change a "Location" header and return it to the client, in such case we can avoid of additional headache with sessions\cookies stuff, how do you think ?
        But what will you put in the location header?

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re^3: Redirection with LWPUserAgent
by Marshall (Canon) on Dec 19, 2011 at 09:52 UTC
    Your question has morphed into something else. You have a lot of reading to do, starting with LWP::UserAgent.

    Right there in the synopsis, it tells you:

    my $response = $ua->get('http://search.cpan.org/'); if ($response->is_success) { print $response->decoded_content; # or whatever } else { die $response->status_line; }
    HTTP status codes tells you that 4xx codes mean "it didn't work". 404 is "not found". And $response->is_success will be false. So it is easy to tell if a URL actually lead to somewhere.

    The LWP will follow the 3xx redirection status codes up to the default of 7 links (the LWP page above tells you that too). This behavior can be changed but it sounds like you don't want to do that.

    I am no expert on LWP, but I've written a few of these things - the complications and weird things that can happen are legion. I will venture to say that your chances of success are approximately 0% if you don't write some experimental code on your own to "play around" before you tackle your overall project given that it sounds pretty ambitious. I would write a basic framework and get that working before tackling the boundary cases and tricky handling of response codes!

    When you encounter a problem (and you will), try a lot of experiments and then post some code, preferably something simple (subset of your actual code and use some URL where we can run your code to replicate your results).