http://qs1969.pair.com?node_id=209536


in reply to How to do session times The Right Way

I'm confused. Surely this is simply a case of requiring a single line entry per session noting the start and stop times, as indexes within a database, ie:
start_time , stop_time , userid 20020901101111 , 20021002121201 , fred 20020904102010 , NULL , johnwhoneverlogsoff
With such a table, indexed appropriately, the query:
select count(*) from sessions where start_time < $end_of_segment and ( +stop_time > $start_of_segment || stop_time IS NULL);
Should give you the count, and quickly too since the database can make good use of its indexes in that instance.

I would have thought such a table would be a common part of an ISP setup, but if it isn't, you may find the table advantageous for other kinds of session related data as well.

In either case, the only task left is to take your logs and shunt them into a database table of the above format (matching as it goes the appropriate stops with starts, although I am told that since the session time is included in the stop message, you could also just scan stops and calculate the start time from that), something that could be done once per hour, per day, or on demand as necessary. A bit of intelligence in the parser will allow it to continue parsing from where it left off last time, resulting in little CPU burn to get up to date. Hope that helps.

Replies are listed 'Best First'.
Re: Re: How to do session times The Right Way
by strider corinth (Friar) on Oct 31, 2002 at 22:02 UTC
    That is really just the second option I suggested (translate "data structure" to "database"). I'll admit that it's faster than I had been envisioning, since I wasn't thinking exactly along the lines of databases, but it still isn't what I think of as an elegent solution, hence my initial post. I'm more interested in this from a programming theory standpoint than from a practical one. I guess I should've made that more clear in my original post: I want to know if this be done in Perl, without the use of other technologies.

    Another assumption you (and others) have made is that I'm at an ISP, and that I'm using RADIUS. I'm not. =)

    On the plus side, your answer is certainly the sort the person I'm doing this work for would like. It's simple, and it uses technology already available to us. Thanks for taking the time to respond with it.
    --

    Love justice; desire mercy.
      I think the primary issue you have is that you have an unbounded problem. At any given point in time, you cannot state "a session operating at this point in time could not possibly have started any earlier than X", you have no choice but to parse the entire logset up until X in order to determine what sessions are open.

      Now, you can distribute this cost as you will, either paying it every time you want an answer, or in manageable blocks via checkpointing, database insertion, whatever, but there is no way to avoid paying that cost, its built into the problem.

      The solution I offered is the most elegant I can think of, in that it provides an arbitrary insertion model (no particular limitations like once a day) allowing you to distribute the fixed cost as you see fit, and a storage method that provides very low O access to the data you require.

      Except for extremely simple data sets, there are no methods I'm aware of that can improve on this in the general case.

      There are special cases wherein the unbounded nature is not so relevant. For example, consider the case where there is only one user, in this situation it is unnecessary to start from the begining, you can start from the X point and simply run backwards through the logs until you find a start or end for that user, then you can say "1", or "0".

      For a small number of users, it may be practical to extend this to track the current status of each user. This still worst-cases to parsing every line of the logs from X to the start in the instance that a given user has never logged in. With a large number of users however you increase the likelyhood of the worst-case scenario.

      It is also possible to bound the answers statistically. In order to do this, you run a pre-processor on your current log set which determines a bell curve or equivalent for the average session times. You then tell your parser "I would like the number of users at time X to a 95% certainty", this allows your parser, with the help of the averages, to calculate how far it must trace back in the logs at the most to achieve that level of certainty, given a number of users unaccounted for at that point.

      However, as they say, keep it simple. Daily log parsing, insertion, and then nice simple select queries makes this by far the most effective and dynamic solution, and one that is prone to being automated by 10 lines of PHP so your executives can pull up their web browser and make their own request, instead of calling you :)

        You're right. I was wrong to say that your solution isn't elegant; it is. I guess the main thing is that I was looking for a whole new algorithm: something I could do only in Perl. For practical purposes, your solution will work very well. Because it relies on existant technologies (see your last paragraph =) I actually asked the person I'm doing this work for this morning if they'd like me to use the method you described. It turns out that we don't have any database space to use. =)

        --

        Love justice; desire mercy.