Are you addicted to the Chatterbox? Do you seek not-so-professional advice on how to avoid seeking professional advice for your addiction? Do you need continual validation of your addiction through meaningless stats? Do you seek mastery of your addiction by being the strongest, fastest, best? Are you hoping that your addiction is going to become an Olympic event? Are you going through withdrawal because mojotoad's CB stats haven't been updated in three weeks now? (Join the club.)
Well, do I have an answer for you! It's the beta program of a NEW and IMPROVED (how can something be both new AND improved?) CB stats! It still has some stability issues, but those who just need a fix for their craving can find one here! It's similar to the CB stats you're used to, but with some minor improvements (and deprovements) (which is fitting for depravity).
Well, you really shouldn't have clicked on that "readmore" button, but since you have (maybe implicitly by having it always displayed), you are obviously in the target audience. So let's get down to some details. First off, the warnings:
Next, some definitions. As I wrote this thing from scratch basically, I had full control over what constituted what. So I thought I'd share them so that you can fully exploit your addiction. The following is a rough approximation of the code I'm using for parsing right now, subject to change.
This is here as an explanation of the "big numbers" shown in the stats. The output is trivially derived from the above true/false designations (well, mostly true/false). Well, trivial for a human, but some of these got to be some very complex subqueries in SQL. Of course, if someone wants a change to the above, please let me know.# some regex's I use in multiple places. our $user = qr{ \[([^\]\s][^\]]+)\] | \[\s\Qhttp://(?:www.)perlmonks.(?:org|com)/?node(?:_id)=\E([^\s;&= +]+)\s\| }x; my $aggress_user = qr{(?: ([^[]\S+) | $user )}x; my $aggress = qr{ /me\s+(?:slaps?|hits?|strikes?|kicks?|throws?\b.*?\bat)\s+$aggress +_user }x; #... for my $test ( [ question => qr/\?(?:\s|$)/ ], [ yell => qr/\!(?:\s|$)/ ], [ aggressor => sub { if (/$aggress/) { require URI::Escape; my $user = URI::Escape::uri_unescape($+); # make sure the user exists... $user = CBStats::UserR::fetch($user); $user && $user->nodeid() > 0; } else { return 0; } } ], [ happy => qr/(?:^|\s|\b)[:;B8]-?[)D}P>]+|[(]-?[ +:;](?:$|\s|\b)/ ], [ sad => qr/(?:^|\s|\b):['`]?-?\(+|[)]-?['`]?[ +:](?:$|\s|\b)/ ], [ thought => qr/\.oO\s*\(.*\)/ ], [ action => qr/\/me/ ], [ aggressee => sub { if (/$aggress/) { require URI::Escape; my $user = URI::Escape::uri_unescape($+); # make sure the user exists... $user = CBStats::UserR::fetch($user); $user && $user->nodeid() > 0 ? $user->nick( +) : ""; } else { "" } } ], [ words => sub { #require Text::ParseWords; my @x = split ' ', $_; scalar @x; } ], [ soliliquay => sub { my $prev = $self->find_where('MSGID IN (SELECT +MAX(MSGID) FROM LOGS WHERE MSGID < ?)', $self->msgid()); if ($prev) { $prev->from() eq $self->from() ? ($prev->soliliquay() || 0) + 1 : 1; } else { undef } } ], ) { my ($action, $check) = @$test; if (not defined $self->$action()) { $self->$action( ref $check eq 'Regexp' ? (/$check/ ? 1 : 0) + : ref $check eq 'CODE' ? $check->() : $check ); } } # Also ... karma is found via: qr/ ^$user(\+\+|\-\-) \s* \#?\s*(.*\S) /x
The process model is moderately convoluted:
The backend is DB2. Why? Because a) it's probably faster than DBD::CSV ;-), b) it's what we use at $work, and, most importantly, c) the point of these statistics was NOT the generation of the statistic, but to learn RDBMS tools and techniques, and especially to learn some more complex SQL. I'd say it's been a resounding success on the last point, even if the rest of the system falls over tomorrow.
So, what to do next? I'm hoping for two things: 1) perlmonk.org issues to be resolved so I can move the URL (*DONE*), and 2) a more stable CB feed that doesn't chew up more PM resources at which point I can remove my dependency on X (dependency removed, stability still being worked on). Once the first issue is resolved, I'll send a private message to SiteDocClan to get the site FAQ updated to the new stats page (sorry, mojotoad) (*DONE*).
Update: I should point out that when I query to figure out who is the "top" of each category, if there is actually a tie, I favour the newest user. That means that if two people have 87 messages over the last week, the one who joined more recently (i.e., has a higher node ID on perlmonks) gets the higher rank. OTOH, if two people have tied for attacking others, the one with the higher node ID will be given the benefit of the (relative) inexperience and get the lower rating (which could push him/her off the list of two). This may change to the latest case (i.e., the last smiley, the last post, the last attack, whatever).
Update2: Changed the data gathering, but only slightly. (Thanks, ambrus for the base code.)
Update3: Changed the URL. Future changes will be noted on that site, not in this node.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Chatterbox Addicts not-so-anonymous
by jdporter (Paladin) on Sep 10, 2008 at 18:04 UTC | |
Re: Chatterbox Addicts not-so-anonymous
by jvector (Friar) on Sep 10, 2008 at 18:49 UTC | |
Re: Chatterbox Addicts not-so-anonymous
by Limbic~Region (Chancellor) on Sep 10, 2008 at 23:44 UTC |