in reply to Unique visits - Webserver log parser

Anyone have suggestions for improving this?
Yes, give up on this idea:
I want to find out the number of unique visits for each log file.
An IP address is not a "visitor". Many users hide behind a single IP for a proxy, and a single user can come from multiple IPs for large proxies like AOL's proxy.

There are no visitors, only hits. See one of Alan Flavell's messages for a fuller explanation. Just search for "Flavell visitors" for various takes by this legendary Web Expert about the futilty of your task.

You might as well use Perl's rand function instead.

-- Randal L. Schwartz, Perl hacker

  • Comment on •Re: Unique visits - Webserver log parser

Replies are listed 'Best First'.
Re: •Re: Unique visits - Webserver log parser
by ciryon (Sexton) on Feb 27, 2002 at 10:24 UTC
    Sorry, hits is what I meant.
not quite right (was: Unique visits - Webserver log parser)
by legLess (Hermit) on Feb 28, 2002 at 05:33 UTC
    ciryon, I'd argue that if you're regularly generating multi-gig log files you need a more high-powered solution than simple IP address analysis. Consider using a service like WebTrends, or rolling your own. If you can create a fast, secure and accurate WebTrends clone for your local site, you'll have done something impressive (a little futile, perhaps, since WebTrends is cheap, but it will be fun).

    Merlyn, while I can't argue with you about code (and your enum example below is nice), I think you're exaggerating Alan Flavell's views as he expressed them. He didn't say (in that message or anything else Google could find for me) that "there are no visitors" or that "IPs are meaningless."

    Your assertion that "there are no visitors, only hits" is wrong on its face. The vast majority of web users accept 3rd-party cookies, and services like WebTrends do a spectacular job of tracking first-time, returning and unique visitors.

    Can you determine exact unique visitors from log files using IP addresses only? No. Should you use IP addresses to identify users or sessions, or as part of a security process? No. These tasks are either futile, dangerous, or both.

    But can you use IP addresses to get a "pretty good" idea of first-time, returning and unique visitors? Yes. There are better methods, but they're much more complex. As long as you know your results won't be very accurate, munging a log file with Perl can be a good, cheap solution. Plus it can be a good exercise, especially for a self-described newbie. So what if AOL users are proxied? They don't all use the same proxy at the same time; you can time sessions out after X minutes and improve your accuracy a bit.
    You might as well use Perl's rand function instead.
    I know this is just hyperbole on your part, but I think it's a disservice to ciryon. He's a novice asking for advice, and I think we owe him honesty.
    --
    man with no legs, inc.