in reply to How to implement a fourth protocol

One related idea I had at lunchtime is this: suppose the bot attempts to crawl your directory structure. Having detected it's a bad bot. Is there a way to make its client machine cache a recursive directory structure? (or a false picture of one induced by artificial response). I am not sure how caching works at a lower level - it's just a functional idea. But if so, then the bot would be crawling its own cache forever, rather than bothering your real machine with it.

-M

Free your mind

Replies are listed 'Best First'.
Re: Or spin the bot back? (Was: How to implement a fourth protocol)
by ikegami (Patriarch) on Mar 28, 2007 at 17:25 UTC

    Spiders don't revisit pages they have already visited. That's necessary because web sites are naturally full of loops. For example, the home page links to the contact page, and the contact page links to the home page. Since it doesn't make the same request twice, a client-side cache will never get hit. You can make the client spin, but it will involve a request to your server.