in reply to User tracking

The server will normally never be informed when the user "leaves" (for any definition of "leave") the page. Usually I have several tabs open within my browser. Am I "staying" on all these pages all of the time? What if I open another browser? Have I then "left" all the pages in the previous browser?

The best you can hope to achieve with a little bit of javascript is to get notified when the user "closes" your page, but other than telling you that the page is now no longer showing on the user's system, such message has --IMHO-- nothing significant to tell you.

What are you really trying to achieve?

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Replies are listed 'Best First'.
Re^2: User tracking
by gemoroy (Beadle) on May 18, 2009 at 08:30 UTC

    I am trying to distinguish bots from people.
    It's quite hard to do it because of flexability of libraries such as libwww...
    And a presence of JS could'nt be a main sign.
      Interesting ...

      And how would knowing the time someone stays on a page help you in determining whether it is a bot or a human who accessed the page?

      And even more important: why do you need to know this? Do you want to refuse access to bots? Then include a robot.txt file on your server. "Bad bots" will not be stopped of course, but as far as I know no other technology will be able to do so, provided the bad bots are equipped with a modicum of intelligence.

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      Checking user-agent generally does work pretty well, as does robots.txt.

      If you actually need to identify (rogue) bots which use browser UA strings and ignore robots.txt, your best chance would be to look at the patterns in the timestamps for when pages are requested:

      • Humans will generally either open one page at a time, making single requests at wildly irregular intervals (possibly interspersed with HEAD requests when they use the "back" button), or open everything in tabs, producing flurries of several requests within a few seconds followed by generally longer intervals of few-or-no requests.
      • Bots will tend to request pages at a relatively steady rate - even if they have randomness in their delay, it's rarely more than half the base interval - and often quicker than a human would.
      Don't rely on javascript to make your determination. Some of us use the noscript plugin, which blocks javascript from running unless it comes from a whitelisted site, but we're still not bots.

      Anyhow, though, what are you attempting to accomplish by identifying what's a bot and what isn't?

      You might find the visualization presented in O'Reilly's A New Visualization for Web Server Logs interesting. In some cases, automated access will stand out quite clearly, and it may help you determine what criteria you want to use if you want to automate detection.