Are you addicted to the Chatterbox? Do you seek not-so-professional advice on how to avoid seeking professional advice for your addiction? Do you need continual validation of your addiction through meaningless stats? Do you seek mastery of your addiction by being the strongest, fastest, best? Are you hoping that your addiction is going to become an Olympic event? Are you going through withdrawal because mojotoad's CB stats haven't been updated in three weeks now? (Join the club.)

Well, do I have an answer for you! It's the beta program of a NEW and IMPROVED (how can something be both new AND improved?) CB stats! It still has some stability issues, but those who just need a fix for their craving can find one here! It's similar to the CB stats you're used to, but with some minor improvements (and deprovements) (which is fitting for depravity).

Well, you really shouldn't have clicked on that "readmore" button, but since you have (maybe implicitly by having it always displayed), you are obviously in the target audience. So let's get down to some details. First off, the warnings:

Next, some definitions. As I wrote this thing from scratch basically, I had full control over what constituted what. So I thought I'd share them so that you can fully exploit your addiction. The following is a rough approximation of the code I'm using for parsing right now, subject to change.

# some regex's I use in multiple places. our $user = qr{ \[([^\]\s][^\]]+)\] | \[\s\Qhttp://(?:www.)perlmonks.(?:org|com)/?node(?:_id)=\E([^\s;&= +]+)\s\| }x; my $aggress_user = qr{(?: ([^[]\S+) | $user )}x; my $aggress = qr{ /me\s+(?:slaps?|hits?|strikes?|kicks?|throws?\b.*?\bat)\s+$aggress +_user }x; #... for my $test ( [ question => qr/\?(?:\s|$)/ ], [ yell => qr/\!(?:\s|$)/ ], [ aggressor => sub { if (/$aggress/) { require URI::Escape; my $user = URI::Escape::uri_unescape($+); # make sure the user exists... $user = CBStats::UserR::fetch($user); $user && $user->nodeid() > 0; } else { return 0; } } ], [ happy => qr/(?:^|\s|\b)[:;B8]-?[)D}P>]+|[(]-?[ +:;](?:$|\s|\b)/ ], [ sad => qr/(?:^|\s|\b):['`]?-?\(+|[)]-?['`]?[ +:](?:$|\s|\b)/ ], [ thought => qr/\.oO\s*\(.*\)/ ], [ action => qr/\/me/ ], [ aggressee => sub { if (/$aggress/) { require URI::Escape; my $user = URI::Escape::uri_unescape($+); # make sure the user exists... $user = CBStats::UserR::fetch($user); $user && $user->nodeid() > 0 ? $user->nick( +) : ""; } else { "" } } ], [ words => sub { #require Text::ParseWords; my @x = split ' ', $_; scalar @x; } ], [ soliliquay => sub { my $prev = $self->find_where('MSGID IN (SELECT +MAX(MSGID) FROM LOGS WHERE MSGID < ?)', $self->msgid()); if ($prev) { $prev->from() eq $self->from() ? ($prev->soliliquay() || 0) + 1 : 1; } else { undef } } ], ) { my ($action, $check) = @$test; if (not defined $self->$action()) { $self->$action( ref $check eq 'Regexp' ? (/$check/ ? 1 : 0) + : ref $check eq 'CODE' ? $check->() : $check ); } } # Also ... karma is found via: qr/ ^$user(\+\+|\-\-) \s* \#?\s*(.*\S) /x
This is here as an explanation of the "big numbers" shown in the stats. The output is trivially derived from the above true/false designations (well, mostly true/false). Well, trivial for a human, but some of these got to be some very complex subqueries in SQL. Of course, if someone wants a change to the above, please let me know.

The process model is moderately convoluted:

  1. The data gathering is done "live." This just records the raw data from its source (currently IRC).
  2. Data transformation. There is a cron job running every 6 minutes to see what has transpired recently, and run the above code. Node that the user fetch function includes querying PM if I've never seen that nick before (I'm presuming that nicks never change node ids here). Once I have a cache of a user, I can also set the user to be hidden so they don't show up in the stats (this convolutes the SQL like I would never have thought possible). This is just perl, and should only need updating if I want new data.
  3. Output. There is a cron job running every hour to run a couple dozen SQL statements in a Template Toolkit plugin (via ttree), and then upload the resulting html file to the destination. This job also will prune anything over a week old (I hope - I don't have a week's worth of data yet, but that's why the "earliest" and "latest" timestamps are showing on the stats page), and run the data transformation again (just in case there's data from between the last transformation cron run and now).
A little expansion on a point: if you don't want your nick to show up on the stats, despite this all being for fun, simply let me know. I have an entry in a table for this express purpose, and already have 2 3 people on the list. Your lines will STILL be counted in the aggregate, but your name won't show up. That means that if you wrote 200 lines in the week, it'll still affect the "most active times" and any monk or user references you make will still be added into the total numbers. However, if you were the last user to mention "[bart]", as an example, the name that will show up in the stats will actually be the PREVIOUS person to mention "[bart]". Well, the previous non-hidden user anyway. Which, of course, means that the stats are invalid. But that just means that we already know they're invalid instead of pretending otherwise ;-)

The backend is DB2. Why? Because a) it's probably faster than DBD::CSV ;-), b) it's what we use at $work, and, most importantly, c) the point of these statistics was NOT the generation of the statistic, but to learn RDBMS tools and techniques, and especially to learn some more complex SQL. I'd say it's been a resounding success on the last point, even if the rest of the system falls over tomorrow.

So, what to do next? I'm hoping for two things: 1) perlmonk.org issues to be resolved so I can move the URL (*DONE*), and 2) a more stable CB feed that doesn't chew up more PM resources at which point I can remove my dependency on X (dependency removed, stability still being worked on). Once the first issue is resolved, I'll send a private message to SiteDocClan to get the site FAQ updated to the new stats page (sorry, mojotoad) (*DONE*).

Update: I should point out that when I query to figure out who is the "top" of each category, if there is actually a tie, I favour the newest user. That means that if two people have 87 messages over the last week, the one who joined more recently (i.e., has a higher node ID on perlmonks) gets the higher rank. OTOH, if two people have tied for attacking others, the one with the higher node ID will be given the benefit of the (relative) inexperience and get the lower rating (which could push him/her off the list of two). This may change to the latest case (i.e., the last smiley, the last post, the last attack, whatever).

Update2: Changed the data gathering, but only slightly. (Thanks, ambrus for the base code.)

Update3: Changed the URL. Future changes will be noted on that site, not in this node.


In reply to Chatterbox Addicts not-so-anonymous by Tanktalus

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.