ciryon, I'd argue that if you're regularly generating multi-gig log files you need a more high-powered solution than simple IP address analysis. Consider using a service like
WebTrends, or rolling your own. If you can create a fast, secure and accurate WebTrends clone for your local site, you'll have done something impressive (a little futile, perhaps, since WebTrends is cheap, but it will be fun).
Merlyn, while I can't argue with you about code (and your
enum example below is nice), I think you're exaggerating Alan Flavell's views as he expressed them. He didn't say (in that
message or anything else Google could
find for me) that "there are no visitors" or that "IPs are meaningless."
Your assertion that "there are no visitors, only hits" is wrong on its face. The vast majority of web users accept 3rd-party cookies, and services like WebTrends do a spectacular job of tracking first-time, returning and unique visitors.
Can you determine exact unique visitors from log files using IP addresses only? No. Should you use IP addresses to identify users or sessions, or as part of a security process? No. These tasks are either futile, dangerous, or both.
But can you use IP addresses to get a "pretty good" idea of first-time, returning and unique visitors? Yes. There are better methods, but they're much more complex. As long as you know your results won't be very accurate, munging a log file with Perl can be a good, cheap solution. Plus it can be a good exercise, especially for a
self-described newbie. So what if AOL users are proxied? They don't all use the same proxy at the same time; you can time sessions out after X minutes and improve your accuracy a bit.
You might as well use Perl's rand function instead.
I know this is just hyperbole on your part, but I think it's a disservice to
ciryon. He's a novice asking for advice, and I think we owe him honesty.
--
man with no legs, inc.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.