in reply to Re^4: Redirection with LWPUserAgent
in thread Redirection with LWPUserAgent

But what will you put in the location header?

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Replies are listed 'Best First'.
Re^6: Redirection with LWPUserAgent
by kazak (Beadle) on Dec 23, 2011 at 20:04 UTC
    Well, my guess we need to modify our "bad" header. We are getting our "bad" header when the initial request is redirected to a some "dummy" location, such locations are pretty common in my case (.../sorry... ; .../Error/...; etc) so we just need to make initial request right from the "bad", for example this is a bad request:

    http://www.google.co.uk/sorry/?continue=http://www.google.co.uk/search%3Fq%3Djust+an+example

    We can detect it by ../sorry/.. in the middle, and this link will lead us straight to the captcha request page, this is not what we need, we need just:

    http://www.google.co.uk/search%3Fq%3Djust+an+example

    So we are extracting from the "Location" Header "bad" location header and replacing it with a "good" one. Although I have not much of experince, I tried to write a code, atleast I'd like to belive that it's a code, and not just a mess :). Also I'm stuck with one thing, I can't figure out how to pass additional parameters to a HTTP::Proxy if new() method already used.
    #!/usr/bin/perl use strict; use warnings; use HTTP::Proxy qw( :log ); use HTTP::Proxy::HeaderFilter::simple; use LWP::UserAgent; + my $ua = LWP::UserAgent->new(); + + $ua->proxy(['http'],'http://127.0.0.1:29999'); + + $ua->timeout(10); + + $ua->agent('Mozilla/5.0 (Windows NT 6.1) AppleWebKit/534.24 (KHTML, li +ke Gecko) Chrome/11.0.696.60 Safari/534.24'); open ( LOGFILE, ">>", "/var/log/repeater.log"); # ##<----- 2) my $proxy = HTTP::Proxy->new( port => '38374', agent => $ua, logfh => <LOGFILE>, ); #HTTP::Proxy->new(@ARGV); ### <--3) $proxy->logmask( ALL ); $proxy->push_filter( host => 'google.com', # only apply to this domain response => HTTP::Proxy::HeaderFilter::simple->new( sub { my ( $s +elf, $headers, $response ) = @_; # skip non redirects return if $response->code !~ /^3/; # pick up location my $location = $headers->header('Location'); # find bad redirections if ( $location =~ m{google.com/sorry.*} ) { # change the redirect my $new_location = $location ; $new_location =~ s/.*(\/sorry\/\?continue=.*)/$1/gx ; $new_location =~ s/\/sorry\/\?continue=//; $headers->header( Location => $new_location ); # print some logging information $self->proxy->log( ALL, LOCATION => "$location => $new_location" ); } } ) ); $proxy->start;
    P.S. Clients are operating through the parent proxies so it makes sense to try to repeat request. Thanks in advance, Sergey.