Interesting ...And how would knowing the time someone stays on a page help you in determining whether it is a bot or a human who accessed the page? And even more important: why do you need to know this? Do you want to refuse access to bots? Then include a robot.txt file on your server. "Bad bots" will not be stopped of course, but as far as I know no other technology will be able to do so, provided the bad bots are equipped with a modicum of intelligence.
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James
| [reply] [d/l] |
Checking user-agent generally does work pretty well, as does robots.txt.
If you actually need to identify (rogue) bots which use browser UA strings and ignore robots.txt, your best chance would be to look at the patterns in the timestamps for when pages are requested: - Humans will generally either open one page at a time, making single requests at wildly irregular intervals (possibly interspersed with HEAD requests when they use the "back" button), or open everything in tabs, producing flurries of several requests within a few seconds followed by generally longer intervals of few-or-no requests.
- Bots will tend to request pages at a relatively steady rate - even if they have randomness in their delay, it's rarely more than half the base interval - and often quicker than a human would.
Don't rely on javascript to make your determination. Some of us use the noscript plugin, which blocks javascript from running unless it comes from a whitelisted site, but we're still not bots.
Anyhow, though, what are you attempting to accomplish by identifying what's a bot and what isn't? | [reply] |
You might find the visualization presented in O'Reilly's A New Visualization for Web Server Logs interesting. In some cases, automated access will stand out quite clearly, and it may help you determine what criteria you want to use if you want to automate detection.
| [reply] |