Howdy, I have code which should recognize a redirect when analyzing WWW::Robots stuff. Basically the jist is this: Robots has a number of hooks along the way which let you edit/change/check the content. The one RIGHT after the GET request is 'invoke-after-get'. Here is my hook for testing purposes:
'invoke-after-get' => sub { my($robot, $hook, $url, $response) = @_; if (DEBUG) { print "ORIG_URL: $url\n"; print "URL: "; for($response->header_field_names) { print +"$_\n"; } print "\n"; print "RESPONSE: ". $response->code ."\n"; print "\n"; }
All fine and dandy when you're dealing with normal stuff. The problem is that I need to check if it's a redirect (301 or a 302 response). Robots NEVER returns a 301 or a 302, always a 200 (Success), even on redirected pages: ie i have a page locally which redirects to google:
C:\Documents and Settings\gecko\Desktop>nc localhost 80 GET /cgi-bin/redirect.pl HTTP/1.1 host:localhost HTTP/1.1 302 Moved Date: Sun, 17 Jun 2007 02:34:36 GMT Server: Apache/2.2.4 (Win32) Location: http://www.google.com Content-Length: 0 Content-Type: text/plain
Yet when i do it with Robots:
ORIG_URL: http://127.0.0.1/cgi-bin/redirect.pl URL: Cache-Control Date Server Content-Type Client-Date Client-Peer Client-Response-Num Client-Transfer-Encoding Set-Cookie Title RESPONSE: 200 URL: http://www.google.com/ ORIG_URL: http://127.0.0.1/cgi-bin/redirect.pl RESPONSE: 200 SIZE: 5799 TITLE: Google
As you can see on the robots one, it doesnt even have a "Location" field, so it seems to be automatically following it, even though the hook is defined as this:
invoke-after-get This hook function is invoked immediately after the robot makes each GET request. This means your hook function will see every type of response, not just successful GETs.
how do you recommend i detect a 301/302 in this case? Thanks monks!

In reply to WWW::Robots problem by gecko

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.