gecko has asked for the wisdom of the Perl Monks concerning the following question:
All fine and dandy when you're dealing with normal stuff. The problem is that I need to check if it's a redirect (301 or a 302 response). Robots NEVER returns a 301 or a 302, always a 200 (Success), even on redirected pages: ie i have a page locally which redirects to google:'invoke-after-get' => sub { my($robot, $hook, $url, $response) = @_; if (DEBUG) { print "ORIG_URL: $url\n"; print "URL: "; for($response->header_field_names) { print +"$_\n"; } print "\n"; print "RESPONSE: ". $response->code ."\n"; print "\n"; }
Yet when i do it with Robots:C:\Documents and Settings\gecko\Desktop>nc localhost 80 GET /cgi-bin/redirect.pl HTTP/1.1 host:localhost HTTP/1.1 302 Moved Date: Sun, 17 Jun 2007 02:34:36 GMT Server: Apache/2.2.4 (Win32) Location: http://www.google.com Content-Length: 0 Content-Type: text/plain
As you can see on the robots one, it doesnt even have a "Location" field, so it seems to be automatically following it, even though the hook is defined as this:ORIG_URL: http://127.0.0.1/cgi-bin/redirect.pl URL: Cache-Control Date Server Content-Type Client-Date Client-Peer Client-Response-Num Client-Transfer-Encoding Set-Cookie Title RESPONSE: 200 URL: http://www.google.com/ ORIG_URL: http://127.0.0.1/cgi-bin/redirect.pl RESPONSE: 200 SIZE: 5799 TITLE: Google
how do you recommend i detect a 301/302 in this case? Thanks monks!invoke-after-get This hook function is invoked immediately after the robot makes each GET request. This means your hook function will see every type of response, not just successful GETs.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: WWW::Robots problem
by merlyn (Sage) on Jun 17, 2007 at 03:18 UTC |