Counting events and getting top 5 matches

iphony has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Counting events and getting top 5 matches by Corion (Patriarch) on Sep 17, 2008 at 08:45 UTC
The only code you show is SQL, so I'm giving you an SQL answer: `select code, username, userid, count() as code_count group by code, username, userid order by code, code_count` [download] To get the "top five" out of that select statement, you can use the RANK() function: `select code, username, userid, count() as code_count, rank() over (ORDER BY code_count DESC) as code_rank where code_rank <= 5 group by code, username, userid` [download] The `rank()` function doesn't seem to be supported on Sybase ASE. Maybe you can create the select as a view and then use some `rowid` trickery to get at the top five users per code group.	[reply] [d/l] [select]
Re^2: Counting events and getting top 5 matches by psini (Deacon) on Sep 17, 2008 at 09:11 UTC
++ BTW, I'm a strong supporter of the view that when you can have the DBMS do the work, you must. You never know when your DB will be on a different machine and having tons of data flowing around the net only to arrive to a program that count them is... illogical. Rule One: "Do not act incautiously when confronting a little bald wrinkly smiling man."	[reply]
Re: Counting events and getting top 5 matches by jonadab (Parson) on Sep 17, 2008 at 11:37 UTC
I'm a strong supporter of the view that when you can have the DBMS do the work, you must. I'm not so sure that's always the case. There can be a real tradeoff between performance and maintainability there. I've seen some of the hairy twenty-page stored procedures people write when they're trying to do something in SQL that is fundamentally better suited to a general-purpose high-level language and would be less than a screenful of clean, easy-to-read Perl. Certainly, there are times when it's better to have the DBMS do the work... -- We're working on a six-year set of freely redistributable Vacation Bible School materials.	[reply]
Re^2: Counting events and getting top 5 matches by BrowserUk (Patriarch) on Sep 17, 2008 at 15:14 UTC
Re^2: Counting events and getting top 5 matches by psini (Deacon) on Sep 17, 2008 at 12:48 UTC
Re^2: Counting events and getting top 5 matches by moritz (Cardinal) on Sep 17, 2008 at 09:11 UTC
without `rank()` you can simply add a `limit 5`. Or am I missing some obvious glitch here?	[reply] [d/l] [select]
Re^3: Counting events and getting top 5 matches by Corion (Patriarch) on Sep 17, 2008 at 09:18 UTC
This will give you the overall top 5 users, not the top five users per code group. Given: `code userid count 1 corion 1000 2 corion 999 3 corion 888 4 corion 777 5 corion 6 1 moritz 555 2 moritz 444 3 moritz 333 4 moritz 222 5 moritz 1` [download] ordering by `count,code` would never show `moritz`, and you'd always miss out on low-counting codes (like code 5). Ordering by `code, count` would leave out codes with very few users (if you have 1000 users of code 1 and only 3 of code 5).	[reply] [d/l] [select]
Re^3: Counting events and getting top 5 matches by psini (Deacon) on Sep 17, 2008 at 09:22 UTC
With LIMIT 5 you get the top 5 within the whole table. What the OP wanted is the top 5 for every value of code. It can be obtained without RANK but you need a subquery (or perhaps two) and this can reduce performances. Rule One: "Do not act incautiously when confronting a little bald wrinkly smiling man."	[reply]
Re^2: Counting events and getting top 5 matches by iphony (Acolyte) on Sep 17, 2008 at 08:51 UTC
Thanks for the help. Actually I wanted to do the operation in Perl, as I already got the sql result into an array. I will try and see how it goes by doing it in SQL.	[reply]
Re: Counting events and getting top 5 matches by broomduster (Priest) on Sep 17, 2008 at 08:56 UTC
Assuming you have your counts in a hash, say `%code_count` (with code as the keys), then `use strict; use warnings; my %code_count = ( a => 5, b => 4, c =>10, d => 1, e => 2, f => 17, ); print "$_ $code_count{$_}\n" for (sort { $code_count{$b} <=> $code_count{$a} } keys %code_count + )[0..4]; __DATA__ f 17 c 10 a 5 b 4 e 2` [download] will work. This sorts the hash by value (in largest-first order), and uses a slice to pull off the top five. Updated: Added sample data and output.	[reply] [d/l] [select]
Re: Counting events and getting top 5 matches by jonadab (Parson) on Sep 17, 2008 at 11:14 UTC
I need to do two things: 1) get the number of occurrences of each code 2) get the top 5 username/userid pair for each of these code My SQL isn't up to that sort of processing (I mostly just use SQL for data _retrieval_), so I'm going to go for a Perl answer. I assume you are using DBI and already have a connection to the database in question; I'm going to call the DBI connection $db for lack of a better name. my $query = $db->prepare("select code, username, userid from transac +tions"); $query->execute(); my %code = map { $_ => [] } 0 .. 13; while (my $r = $qeury->fetchrow_hashref()) { my $c = $$r{code}; push @{$code{$c}}, $r; } # For each code $c, $code{$c} is now an arrayref which # contains the relevant records (as hashrefs). The # number of elements in the array equals the number # of occurances. Now, for the second part... my %topfive; for my $c (keys %code) { my %user; for my $row (@{$code{$c}}) { # Note: this assumes that the userid field # is unique to each user. $user{$$row{userid}}{count}++; $user{$$row{userid}}{name} = $$row{username}; } $topfive{$c} = [(sort { $$b[2] <=> $$a[2] } map { [$_, $user{$_}{name}, $user{$_}{count}] } keys %user)[0 .. 4]]; } [download] If you have trouble understanding how any of this code works, just ask about the part you have trouble with, and someone will explain it. This feels like a business problem to me, but on the off chance that it is actually homework, note that your professor will certainly be able to tell you didn't write the above code if you just copy it verbatim. It contains several Perl idioms that a beginning student would not have written. -- We're working on a six-year set of freely redistributable Vacation Bible School materials.	[reply] [d/l]
Re: Counting events and getting top 5 matches by Tanktalus (Canon) on Sep 17, 2008 at 14:19 UTC
Let's see ... I've just recently done something similar... in pure SQL. (Well, there may be some DB2-isms here, but you don't specify your DB vendor.) WITH T AS (SELECT U.NICK, COUNT() AS TOTAL, R.REFERREDNODE FROM REFERU R, USER U WHERE R.REFERREDNODE IS NOT NULL AND R.REFERREDNODE = U.NODEID AND NODEID > 0 AND VISIBLE = 'Y' GROUP BY U.NICK, R.REFERREDNODE ORDER BY COUNT() DESC FETCH FIRST 5 ROWS ONLY), U AS (SELECT T.NICK AS NICK, MAX(R.MSGID) AS MAXID, MIN(R.MSGID) AS MINID FROM USER, T, LOGS L, REFERU R WHERE T.REFERREDNODE = R.REFERREDNODE AND R.MSGID = L.MSGID AND USER.NICK = L.FROM AND VISIBLE = 'Y' AND NODEID > 0 GROUP BY T.NICK) SELECT T.NICK, T.TOTAL, LMAX.FROM, LMIN.FROM FROM T, U, LOGS LMAX, LOGS LMIN WHERE U.MAXID = LMAX.MSGID AND U.MINID = LMIN.MSGID AND T.NICK = U.NICK ORDER BY T.TOTAL DESC, U.MAXID DESC [download] A couple of subselects ... had it been my intention to get out these stats rather than learn SQL, I probably would have just queried for the counts, and then resolved individual items later in other queries. It might have been easier on the db, too, because there'd be far fewer joins. But I still would have got the db to do the counting for me - just to keep the data transfer down, and because the db is probably faster at it than perl. PS - I don't actually have that much data hard-coded, I just didn't want to show the binding. That said, the values are static, the query isn't reused with other values, so leaving them in the query probably isn't a bad thing anyway. PPS - I realise this doesn't directly answer your question - but may give you just another way to look at it.	[reply] [d/l]