Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Re: Re: Re: How to do session times The Right Way

by PhiRatE (Monk)
on Nov 01, 2002 at 00:58 UTC ( #209610=note: print w/replies, xml ) Need Help??

in reply to Re: Re: How to do session times The Right Way
in thread How to do session times The Right Way

I think the primary issue you have is that you have an unbounded problem. At any given point in time, you cannot state "a session operating at this point in time could not possibly have started any earlier than X", you have no choice but to parse the entire logset up until X in order to determine what sessions are open.

Now, you can distribute this cost as you will, either paying it every time you want an answer, or in manageable blocks via checkpointing, database insertion, whatever, but there is no way to avoid paying that cost, its built into the problem.

The solution I offered is the most elegant I can think of, in that it provides an arbitrary insertion model (no particular limitations like once a day) allowing you to distribute the fixed cost as you see fit, and a storage method that provides very low O access to the data you require.

Except for extremely simple data sets, there are no methods I'm aware of that can improve on this in the general case.

There are special cases wherein the unbounded nature is not so relevant. For example, consider the case where there is only one user, in this situation it is unnecessary to start from the begining, you can start from the X point and simply run backwards through the logs until you find a start or end for that user, then you can say "1", or "0".

For a small number of users, it may be practical to extend this to track the current status of each user. This still worst-cases to parsing every line of the logs from X to the start in the instance that a given user has never logged in. With a large number of users however you increase the likelyhood of the worst-case scenario.

It is also possible to bound the answers statistically. In order to do this, you run a pre-processor on your current log set which determines a bell curve or equivalent for the average session times. You then tell your parser "I would like the number of users at time X to a 95% certainty", this allows your parser, with the help of the averages, to calculate how far it must trace back in the logs at the most to achieve that level of certainty, given a number of users unaccounted for at that point.

However, as they say, keep it simple. Daily log parsing, insertion, and then nice simple select queries makes this by far the most effective and dynamic solution, and one that is prone to being automated by 10 lines of PHP so your executives can pull up their web browser and make their own request, instead of calling you :)

  • Comment on Re: Re: Re: How to do session times The Right Way

Replies are listed 'Best First'.
Re: Re: Re: Re: How to do session times The Right Way
by strider corinth (Friar) on Nov 01, 2002 at 14:59 UTC
    You're right. I was wrong to say that your solution isn't elegant; it is. I guess the main thing is that I was looking for a whole new algorithm: something I could do only in Perl. For practical purposes, your solution will work very well. Because it relies on existant technologies (see your last paragraph =) I actually asked the person I'm doing this work for this morning if they'd like me to use the method you described. It turns out that we don't have any database space to use. =)


    Love justice; desire mercy.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://209610]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (3)
As of 2023-06-08 05:31 GMT
Find Nodes?
    Voting Booth?
    How often do you go to conferences?

    Results (29 votes). Check out past polls.