in reply to Re^2: Redirection with LWPUserAgent
in thread Redirection with LWPUserAgent

Yes, I think HTTP::Proxy can do that. Your users connect to the server side of HTTP::Proxy and the client side then receives and filters the reply and the server side issues a redirection to another search engine if the reply is 404 or 410. All other status messages (including the redirection messages) should be passed along "as is" in my opinion.

However there is a hidden snag in your idea: the HTTP protocol is stateless, so when the user clicks on one of the search engine's results and gets a "bad" result, you want him to redirect to another search engine with the original search request, but by that time the server (HTTP::Proxy) has all forgotten about that original search request. You will have to implement some form of session management or perhaps use a short living cookie to maintain state.

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Replies are listed 'Best First'.
Re^4: Redirection with LWPUserAgent
by kazak (Beadle) on Dec 21, 2011 at 14:40 UTC
    Or maybe we can choose another way,maybe we can implement it with a redirection. I mean when answer will be "302 example.com/bad_result/..." just change a "Location" header and return it to the client, in such case we can avoid of additional headache with sessions\cookies stuff, how do you think ?
      But what will you put in the location header?

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        Well, my guess we need to modify our "bad" header. We are getting our "bad" header when the initial request is redirected to a some "dummy" location, such locations are pretty common in my case (.../sorry... ; .../Error/...; etc) so we just need to make initial request right from the "bad", for example this is a bad request:

        http://www.google.co.uk/sorry/?continue=http://www.google.co.uk/search%3Fq%3Djust+an+example

        We can detect it by ../sorry/.. in the middle, and this link will lead us straight to the captcha request page, this is not what we need, we need just:

        http://www.google.co.uk/search%3Fq%3Djust+an+example

        So we are extracting from the "Location" Header "bad" location header and replacing it with a "good" one. Although I have not much of experince, I tried to write a code, atleast I'd like to belive that it's a code, and not just a mess :). Also I'm stuck with one thing, I can't figure out how to pass additional parameters to a HTTP::Proxy if new() method already used.
        #!/usr/bin/perl use strict; use warnings; use HTTP::Proxy qw( :log ); use HTTP::Proxy::HeaderFilter::simple; use LWP::UserAgent; + my $ua = LWP::UserAgent->new(); + + $ua->proxy(['http'],'http://127.0.0.1:29999'); + + $ua->timeout(10); + + $ua->agent('Mozilla/5.0 (Windows NT 6.1) AppleWebKit/534.24 (KHTML, li +ke Gecko) Chrome/11.0.696.60 Safari/534.24'); open ( LOGFILE, ">>", "/var/log/repeater.log"); # ##<----- 2) my $proxy = HTTP::Proxy->new( port => '38374', agent => $ua, logfh => <LOGFILE>, ); #HTTP::Proxy->new(@ARGV); ### <--3) $proxy->logmask( ALL ); $proxy->push_filter( host => 'google.com', # only apply to this domain response => HTTP::Proxy::HeaderFilter::simple->new( sub { my ( $s +elf, $headers, $response ) = @_; # skip non redirects return if $response->code !~ /^3/; # pick up location my $location = $headers->header('Location'); # find bad redirections if ( $location =~ m{google.com/sorry.*} ) { # change the redirect my $new_location = $location ; $new_location =~ s/.*(\/sorry\/\?continue=.*)/$1/gx ; $new_location =~ s/\/sorry\/\?continue=//; $headers->header( Location => $new_location ); # print some logging information $self->proxy->log( ALL, LOCATION => "$location => $new_location" ); } } ) ); $proxy->start;
        P.S. Clients are operating through the parent proxies so it makes sense to try to repeat request. Thanks in advance, Sergey.