tanger has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I'm trying to make a WhosOnline feature for my Teen Community Online. In order to do this I thought when some1 logs in it writes to a dbm file. The key would be the member's name and the value would be a time/date they logged in. Then everytime they click to a area of the page, it gives them a new time/data. Also in each area it checks the time/date they logged in or last did something in a area and minus's the current time and see if theyre been idle for 40 mins. If so it removes them off the dbm file. Would this work out? If not can you tell me another way to do so.

Edit: 2001-03-03 by neshura

Replies are listed 'Best First'.
Re: A Time Stamp problem
by dws (Chancellor) on Feb 07, 2001 at 12:52 UTC
    Sure, you can do this with dbm, but you might want to consider alternatives after thinking about the problem.

    (I assume that by "login" you mean that they've been authenticated by the application, and not the OS, and that you've set a tracking cookie that you can reference whenever the user hits a page or invokes a script on the web site.)

    As you've noted, to present the list of who is online in a given area you need to answer the question "which authenticated users have hit pages in this area within the last N minutes?" How to best do this depends on a number of factors, including how large your community is, what kind of peak traffic you expect, and whether you want to maintain data for more than N minutes.

    Fundamentally, there are two approaches: you can keep your data (usernames) ordered by most recent access time, or you can update a field in records that are unordered with respect to access time.

    In the ordered case, you start scanning the (ordered) records until the last accessed time falls out of range. Unless you expect to have large peaks visits, chances are good that you can keep a relatively short list of recent visitors, culling the list at update time. Assuming that the disk blocksize is 8K, and the average length of a username is 10 characters, you can safely keep up to 400 "username timestamp\n" records in a per-area flat file that you can read with a single disk access, and can update with a single access (if you truncate the file first). You may well have had more visitors within the past N minutes, but past some smaller number, I doubt that you're going to want to display that list on a webpage. (Consider the "Other Users" box to the right. It gets unwieldly after ~30 users.)

    In the unordered case, you need to sequentially scan records, checking the last accessed time of each. This won't present any performance penalties for a small community (~1600 "name timetamp\n" records can fit into 4 disk pages in a flat file, and into a few dozen pages if you're using dbm or MySQL.)

    Now consider that you're going to be doing an update for every page fetched. Update in the ordered (by access time) case is fast if you limit your bookeeping to the number of records that you're actually going to display. Update in the unordered case is also quick, assuming that your database is indexed on username.

    So which is best for you?

    Assuming that you have a large community, and want to keep as last-accessed time for each user, and are willing to limit the number of "recent" visitors you'll display to sane number, you might to well to consider a hybrid scheme: use dbm (or mysql) to maintain your user data including last-accessed times, and use a flat file to keep a short list of visitors ordered by last access.

      Ouch. The members data is all contained in a DBM format. ;'( I have 105 members and looking towards much more when we start our advertising campaign. The creator of the script said a site (50megs.com which was sold to about.com) was running the script with 900,000 + members on a Apache pentium 3 700 mhz. I'm learning SQL right now and looking foward to make the 'WhosOnline' feature off it. Would my server be slow because it holds the members data of DBM? And would it take forever to change it so its into a MySQL database? Thanks.
Re: A Time Stamp problem
by AgentM (Curate) on Feb 07, 2001 at 09:16 UTC
    Another way? Sure. If each user has his own login, just use User::Utmp and steal your calculations (40 minutes?) from there. You could also try installing Zephyr which may manage the messaging system which you can then pipe to the web pages.

    Of course, I suspect that every person is under one and the same "wwwuser" user, in which case, my above suggestions are worthless. Perlmonks.org has this functionality you are looking for and it may well be worth digging out of the everything engine. Another option would be to cook up something from scratch that would end up looking like the PM code anyway- something like: keep timestamp in DB along with username. Every five minutes or so, you could look through and grab the timestamps attached to users that are <40 minutes away from NOW.

    I really hope you are not really running the whole site off of DBM files, since these don't scale well at all. You should seriously consider a real database like MySQL or something similar.

    AgentM Systems nor Nasca Enterprises nor Bone::Easy nor Macperl is responsible for the comments made by AgentM. Remember, you can build any logical system with NOR.
Re: A Time Stamp problem
by dkubb (Deacon) on Feb 07, 2001 at 09:52 UTC

    As far as a data source, you could use anything from a flat file, to DBM, all the way to a relational database, like MySQL.

    I would think your choice would be determined by a few factors:

    • the speed you'll require
    • the time/ability to set up the solution

    My personal preference would be to use a database, but that's because I already have one set up, and I'm comfortable with it and DBI. I always find myself using flat files/DBM and wishing for something faster, usually not the other way around.

    Whichever you choose, try and make sure your code to save "session" information to the datasource is encapsulated in an object or routine. This will make it easier on you if you need to upgrade or change your mind about the data source in the future after some live tests.

    The second decistion to make is: How are you going to update the datasource each time a user goes into a different area on your site?

    If you're entire site is perl driven, you might be looking at modifying all of these scripts to keep the "who is" list up-to-date every access. Depending on how many scripts their are, and how important downtime is to your users, this could be very easy to difficult.

    A neat alternative would be to use something called a Web Bug to track your users through your site. A web bug is simply an 1 x 1 pixel image tag on all your HTML pages, where the src url points to a single CGI script. The CGI will be loaded along with all the other images on the page. Make sure the code in a bug uses CGI.pm's header() method, with an "expires" set to 0 seconds, so that the browser doesn't cache the image, like so:

    my $cgi = CGI->new; print $cgi->header( -type => 'image/gif', -expires => '+0s', );

    A good tip would be to place the web bug at the top of all your pages so it gets loaded first. AFAIK, most browsers load images based on the order they find them in the HTML. IMHO this is probably easier than changing an entire perl driven web site, since you're only editing HTML.

    Disclaimer: Be careful with this though, as some people can get offended by these, as they have been known to allow easier "profiling" of people on the Internet. Also if people have thier images turned off the bug won't load, although this is probably not a really big limitation.

    If speed is important to you in this case, and I think it should be, then you will want the routine that updates the user's name and last access timestamp, to do this - and only this, quickly. You probably don't want to worry about "reaping" the old sessions from the data source each time you have a request on your site. Scanning the whois list that many times is fairly inefficient.

    A better idea would be to offload this onto a crontab that runs at set intervals and reaps sessions older than $mins. It is far more efficient to do it this way, because you're only doing a full whois list scan and deletion once in a while, not every request.

    Hope this helps.