|P is for Practical|
Re: Re: Re: How to do session times The Right Wayby PhiRatE (Monk)
|on Nov 01, 2002 at 00:58 UTC ( #209610=note: print w/replies, xml )||Need Help??|
I think the primary issue you have is that you have an unbounded problem. At any given point in time, you cannot state "a session operating at this point in time could not possibly have started any earlier than X", you have no choice but to parse the entire logset up until X in order to determine what sessions are open.
Now, you can distribute this cost as you will, either paying it every time you want an answer, or in manageable blocks via checkpointing, database insertion, whatever, but there is no way to avoid paying that cost, its built into the problem.
The solution I offered is the most elegant I can think of, in that it provides an arbitrary insertion model (no particular limitations like once a day) allowing you to distribute the fixed cost as you see fit, and a storage method that provides very low O access to the data you require.
Except for extremely simple data sets, there are no methods I'm aware of that can improve on this in the general case.
There are special cases wherein the unbounded nature is not so relevant. For example, consider the case where there is only one user, in this situation it is unnecessary to start from the begining, you can start from the X point and simply run backwards through the logs until you find a start or end for that user, then you can say "1", or "0".
For a small number of users, it may be practical to extend this to track the current status of each user. This still worst-cases to parsing every line of the logs from X to the start in the instance that a given user has never logged in. With a large number of users however you increase the likelyhood of the worst-case scenario.
It is also possible to bound the answers statistically. In order to do this, you run a pre-processor on your current log set which determines a bell curve or equivalent for the average session times. You then tell your parser "I would like the number of users at time X to a 95% certainty", this allows your parser, with the help of the averages, to calculate how far it must trace back in the logs at the most to achieve that level of certainty, given a number of users unaccounted for at that point.
However, as they say, keep it simple. Daily log parsing, insertion, and then nice simple select queries makes this by far the most effective and dynamic solution, and one that is prone to being automated by 10 lines of PHP so your executives can pull up their web browser and make their own request, instead of calling you :)
In Section Seekers of Perl Wisdom