This is exactly the type of problem that I've wanted to solve with Perl that lead me to propose that Perl add tree-based, very efficient, sorted hashes in a future version (such that entries in the hash are sorted by key).

Since Perl doesn't have that, you could certainly find a module that implements hashes with sorted keys and use that (if the max number of concurrent users is not huge, then the performance should be acceptable). Then you read the "start" records in chronological order and insert them into a sorted hash with the "end" date as the key. Then you basically do a merge on the "start" records that you are reading and on the ever-changing hash of still-current users. Each time you insert a new "start" record into the hash, you first remove all of the items having an "end" date prior to the new "start" date. Track the size of the hash at each step to see how many concurrent users you have as you progress through (past) time.

The other route I'd go is producing a second stream of records sorted by "end" time. You could do this by putting the records into a database with an index on the "start" time and an index on the "end" time (or just do an "order by" on the computed "end" time, if your problem size isn't too large for that to be reasonable). Or you could just produce a second log file with the "end" time up front and use an efficient external 'sort' command to put that second log in the desired order.

Then you merge the two streams. Reading a record from the "start" stream adds an item to your hash of "concurrent users" (keyed by user ID this time). Reading a record from the "end" stream deletes an item from your hash.

If you have the added complexity of perhaps multiple transactions per user, then you'd add a second hash recording the number of concurrent transactions for each user and delete each user when their count goes to zero.

- tye        


In reply to Re: How to Iterate to Identify Concurrent Users (trees, merge) by tye
in thread How to Iterate to Identify Concurrent Users by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.