in reply to perl regex or module that identifies bots/crawlers

Keep in mind that as that information is provided by the client it's not to be trusted. Blocking based on it will keep out the ones that are honest, but there's no guarantee that the "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)" cilent isn't really Nefarious J. Spammer's Goodtime Spamsalot Webcrawler.

If you're really worried look into throttling over-active clients as well (I want to say merlyn had a Web Techniques column or three on doing this).

Update: Ahh, yup: "Throttling your web server"; written for mod_perl 1.x and possibly getting long in the tooth, but the underlying concept is still sound even if you couldn't directly use the code.

  • Comment on Re: perl regex or module that identifies bots/crawlers