in reply to perl regex or module that identifies bots/crawlers

Mmmm.. started wondering. Couldn't you use something like wpoison (www.monkeys.com/wpoison) to generate an ipblacklist from the crawlers that ignore the robot exclusion protocol and use this blacklist to dynamically update the firewall rules? wpoison generates pages that are clearly marked as "off limits" to crawlers, so anything that would follow a wpoison generated page for more than (say) 2 levels would be a valid candidate for blacklisting.

Interesting..

Trying to identify crawlers by signatures is ultimately a losing battle. I've been down that road with spam.
Blocking them when they trespass seems a better alternative to me.
  • Comment on Re: perl regex or module that identifies bots/crawlers