Re: How to Iterate to Identify Concurrent Users (trees, merge)
by tye (Sage) on Oct 26, 2007 at 01:36 UTC
|
This is exactly the type of problem that I've wanted to solve with Perl that lead me to propose that Perl add tree-based, very efficient, sorted hashes in a future version (such that entries in the hash are sorted by key).
Since Perl doesn't have that, you could certainly find a module that implements hashes with sorted keys and use that (if the max number of concurrent users is not huge, then the performance should be acceptable). Then you read the "start" records in chronological order and insert them into a sorted hash with the "end" date as the key. Then you basically do a merge on the "start" records that you are reading and on the ever-changing hash of still-current users. Each time you insert a new "start" record into the hash, you first remove all of the items having an "end" date prior to the new "start" date. Track the size of the hash at each step to see how many concurrent users you have as you progress through (past) time.
The other route I'd go is producing a second stream of records sorted by "end" time. You could do this by putting the records into a database with an index on the "start" time and an index on the "end" time (or just do an "order by" on the computed "end" time, if your problem size isn't too large for that to be reasonable). Or you could just produce a second log file with the "end" time up front and use an efficient external 'sort' command to put that second log in the desired order.
Then you merge the two streams. Reading a record from the "start" stream adds an item to your hash of "concurrent users" (keyed by user ID this time). Reading a record from the "end" stream deletes an item from your hash.
If you have the added complexity of perhaps multiple transactions per user, then you'd add a second hash recording the number of concurrent transactions for each user and delete each user when their count goes to zero.
| [reply] |
|
|
| [reply] |
Re: How to Iterate to Identify Concurrent Users (SQL)
by fenLisesi (Priest) on Oct 26, 2007 at 08:50 UTC
|
You could use SQL, as tye hinted. Suppose you load your data into a table named sesn (session) with fields sesn_id, usr_id, sesn_bgn, sesn_end and whatever else you may need, properly indexed etc. Then, a query like the following should do what you want (untested):
SELECT
foo.usr_id AS user,
foo.sesn_id AS session1,
bar.sesn_id AS session2,
foo.sesn_bgn AS start1,
bar.sesn_bgn AS start2,
foo.sesn_end AS end1,
bar.sesn_end AS end2
FROM
sesn foo,
sesn bar
WHERE
foo.usr_id = bar.usr_id
AND bar.sesn_bgn >= foo.sesn_bgn
AND bar.sesn_bgn <= foo.sesn_end
AND foo.sesn_id != bar.sesn_id
-- AND (conditions on min duration etc here)
Here I have assumed that you are trying to track multiple simultaneous sessions by the same user. If that's not what you meant, adjust accordingly.
Hope this helps. Cheers.
| [reply] [d/l] [select] |
|
|
Thanks tye and fenLisesi. I initially started with a database model but found it to be more cumbersome. After your good advice, I created a view and ran fenLisesi query with some modifications. It did the trick. Its slow though but it gets me what I need. I'll benchmark it with Perl code to see which is more efficient. You monks rock!!!
create view foo as select session_id, start_session, end_session, user
+_id from bar;
</code>
Then I ran
SELECT DISTINCT foo.user_id AS user,
foo.session_id AS session1,
bar.session_id AS session2,
foo.start_session AS start1,
bar_.start_session AS start2,
foo.end_session AS end1,
bar_.end_session AS end2 FROM foo,bar
WHERE foo.user_id != bar_.user_id
AND bar_.start_session >= foo.start_session
AND bar_.start_session <= foo.end_session
AND foo.session_id != bar_.session_id
order by start1 ASC;
| [reply] [d/l] [select] |
Re: How to Iterate to Identify Concurrent Users
by GrandFather (Saint) on Oct 25, 2007 at 23:58 UTC
|
What do you want to output? A single number indicating the greatest number of concurrent users? A list indicating the number of concurrent users whenever that number changes? A list of hourly (or some other period) reports indicating the maximum number of concurrent users in that period? Something else?
If you have a list of connect/disconnect times then you can sort the list by event, then iterate through the list incrementing or decrementing a connected count as appropriate for each event.
Perl is environmentally friendly - it saves trees
| [reply] |
|
|
Quite simply, I would like to output the total number of concurrent users. The list is sorted since it is chronologically based. Thanks Grandfather.
| [reply] |
|
|
| [reply] |
|
|
|
|
|
|
Do you mean "the maximum number of concurrent users over the span of the log file"? The total of one number is rather uninteresting, especially when you're not clear on what that number is.
| [reply] |
|
|
|
|
|
|