Let's put it this way, when I don't block at all, my load average can peak well over 200, leaving me incapable of even logging in via ssh. It's also caused my system to crash. I ran a script that monitors load averages, and when it goes over 20, it reports the top active programs. At those points, it's always the search scripts responding to crawlers. (Research seems to validate others having the problem. See http://www.jensense.com/archives/2006/06/yahoo_search_ma.html)
Since I installed the simplistic testing, I've never had my load go over 1.0, even with my site's usual traffic of over 25,000 unique visitors a day. I have anywhere between 20-50 users doing searches at any given moment, according to my runtime logs.
The remnant bots that I don't check are not hurting, per se, but they are polluting my stats on what people search for. (I really want better data on what people are coming to my site FOR. The crawlers seem to be doing searches on random words.)
I also want to provide more options to searches, but because those would spin even more cpu cycles, I'd rather wait till I can really block out the cruft of these remnant bots.
I'm not concerned about blocking bots that are trying to mask themselves as normal users yet--they haven't presented themselves to be too much of a problem. I can sense illicit activity by monitoring when searches are done in a short timeframe (like within a second of the last one). That's a sure sign of a non-human, but I'd rather nip the problem in the bud if I can.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.