in reply to perl regex or module that identifies bots/crawlers
Mmmm.. started wondering.
Couldn't you use something
like wpoison (www.monkeys.com/wpoison) to generate
an ipblacklist from the crawlers that ignore the robot exclusion protocol and use this blacklist to dynamically
update the firewall rules?
wpoison generates pages that are clearly marked as "off limits" to crawlers, so anything that would follow a wpoison generated page for more than (say) 2 levels would be a valid
candidate for blacklisting.
Interesting..
Trying to identify crawlers by signatures is ultimately
a losing battle. I've been down that road with spam.
Blocking them when they trespass seems a better alternative to me.