iphony has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I've got an SQL statement that goes like
"select code, username, userid from transactions"
"code" is a list of known number that range from 0 to 13. The sql output is stored in a 2-dimension array (just like a table). I need to do two things: 1) get the number of occurrences of each code 2) get the top 5 username/userid pair for each of these code For 1, I can use the array row index (0 to 13) representing the code, then use column 0 as the counter. But how can I achieve point 2? Thanks.

Replies are listed 'Best First'.
Re: Counting events and getting top 5 matches
by Corion (Patriarch) on Sep 17, 2008 at 08:45 UTC

    The only code you show is SQL, so I'm giving you an SQL answer:

    select code, username, userid, count(*) as code_count group by code, username, userid order by code, code_count

    To get the "top five" out of that select statement, you can use the RANK() function:

    select code, username, userid, count(*) as code_count, rank() over (ORDER BY code_count DESC) as code_rank where code_rank <= 5 group by code, username, userid

    The rank() function doesn't seem to be supported on Sybase ASE. Maybe you can create the select as a view and then use some rowid trickery to get at the top five users per code group.

      ++

      BTW, I'm a strong supporter of the view that when you can have the DBMS do the work, you must. You never know when your DB will be on a different machine and having tons of data flowing around the net only to arrive to a program that count them is... illogical.

      Rule One: "Do not act incautiously when confronting a little bald wrinkly smiling man."

        I'm a strong supporter of the view that when you can have the DBMS do the work, you must.

        I'm not so sure that's always the case. There can be a real tradeoff between performance and maintainability there. I've seen some of the hairy twenty-page stored procedures people write when they're trying to do something in SQL that is fundamentally better suited to a general-purpose high-level language and would be less than a screenful of clean, easy-to-read Perl.

        Certainly, there are times when it's better to have the DBMS do the work...

        -- 
        We're working on a six-year set of freely redistributable Vacation Bible School materials.
      without rank() you can simply add a limit 5.

      Or am I missing some obvious glitch here?

        This will give you the overall top 5 users, not the top five users per code group. Given:

        code userid count 1 corion 1000 2 corion 999 3 corion 888 4 corion 777 5 corion 6 1 moritz 555 2 moritz 444 3 moritz 333 4 moritz 222 5 moritz 1

        ordering by count,code would never show moritz, and you'd always miss out on low-counting codes (like code 5). Ordering by code, count would leave out codes with very few users (if you have 1000 users of code 1 and only 3 of code 5).

        With LIMIT 5 you get the top 5 within the whole table. What the OP wanted is the top 5 for every value of code. It can be obtained without RANK but you need a subquery (or perhaps two) and this can reduce performances.

        Rule One: "Do not act incautiously when confronting a little bald wrinkly smiling man."

      Thanks for the help. Actually I wanted to do the operation in Perl, as I already got the sql result into an array. I will try and see how it goes by doing it in SQL.
Re: Counting events and getting top 5 matches
by broomduster (Priest) on Sep 17, 2008 at 08:56 UTC
    Assuming you have your counts in a hash, say %code_count (with code as the keys), then

    use strict; use warnings; my %code_count = ( a => 5, b => 4, c =>10, d => 1, e => 2, f => 17, ); print "$_ $code_count{$_}\n" for (sort { $code_count{$b} <=> $code_count{$a} } keys %code_count + )[0..4]; __DATA__ f 17 c 10 a 5 b 4 e 2
    will work. This sorts the hash by value (in largest-first order), and uses a slice to pull off the top five.

    Updated: Added sample data and output.

Re: Counting events and getting top 5 matches
by jonadab (Parson) on Sep 17, 2008 at 11:14 UTC
    I need to do two things: 1) get the number of occurrences of each code 2) get the top 5 username/userid pair for each of these code

    My SQL isn't up to that sort of processing (I mostly just use SQL for data _retrieval_), so I'm going to go for a Perl answer. I assume you are using DBI and already have a connection to the database in question; I'm going to call the DBI connection $db for lack of a better name.

    my $query = $db->prepare("select code, username, userid from transac +tions"); $query->execute(); my %code = map { $_ => [] } 0 .. 13; while (my $r = $qeury->fetchrow_hashref()) { my $c = $$r{code}; push @{$code{$c}}, $r; } # For each code $c, $code{$c} is now an arrayref which # contains the relevant records (as hashrefs). The # number of elements in the array equals the number # of occurances. Now, for the second part... my %topfive; for my $c (keys %code) { my %user; for my $row (@{$code{$c}}) { # Note: this assumes that the userid field # is unique to each user. $user{$$row{userid}}{count}++; $user{$$row{userid}}{name} = $$row{username}; } $topfive{$c} = [(sort { $$b[2] <=> $$a[2] } map { [$_, $user{$_}{name}, $user{$_}{count}] } keys %user)[0 .. 4]]; }

    If you have trouble understanding how any of this code works, just ask about the part you have trouble with, and someone will explain it.

    This feels like a business problem to me, but on the off chance that it is actually homework, note that your professor will certainly be able to tell you didn't write the above code if you just copy it verbatim. It contains several Perl idioms that a beginning student would not have written.

    -- 
    We're working on a six-year set of freely redistributable Vacation Bible School materials.
Re: Counting events and getting top 5 matches
by Tanktalus (Canon) on Sep 17, 2008 at 14:19 UTC

    Let's see ... I've just recently done something similar... in pure SQL. (Well, there may be some DB2-isms here, but you don't specify your DB vendor.)

    WITH T AS (SELECT U.NICK, COUNT(*) AS TOTAL, R.REFERREDNODE FROM REFERU R, USER U WHERE R.REFERREDNODE IS NOT NULL AND R.REFERREDNODE = U.NODEID AND NODEID > 0 AND VISIBLE = 'Y' GROUP BY U.NICK, R.REFERREDNODE ORDER BY COUNT(*) DESC FETCH FIRST 5 ROWS ONLY), U AS (SELECT T.NICK AS NICK, MAX(R.MSGID) AS MAXID, MIN(R.MSGID) AS MINID FROM USER, T, LOGS L, REFERU R WHERE T.REFERREDNODE = R.REFERREDNODE AND R.MSGID = L.MSGID AND USER.NICK = L.FROM AND VISIBLE = 'Y' AND NODEID > 0 GROUP BY T.NICK) SELECT T.NICK, T.TOTAL, LMAX.FROM, LMIN.FROM FROM T, U, LOGS LMAX, LOGS LMIN WHERE U.MAXID = LMAX.MSGID AND U.MINID = LMIN.MSGID AND T.NICK = U.NICK ORDER BY T.TOTAL DESC, U.MAXID DESC
    A couple of subselects ... had it been my intention to get out these stats rather than learn SQL, I probably would have just queried for the counts, and then resolved individual items later in other queries. It might have been easier on the db, too, because there'd be far fewer joins. But I still would have got the db to do the counting for me - just to keep the data transfer down, and because the db is probably faster at it than perl.

    PS - I don't actually have that much data hard-coded, I just didn't want to show the binding. That said, the values are static, the query isn't reused with other values, so leaving them in the query probably isn't a bad thing anyway.

    PPS - I realise this doesn't directly answer your question - but may give you just another way to look at it.