Complex sorting and counting question.

nefertiti has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I was looking for a solution to my problem of getting the top x matches for each error category. I already have the entire table stored in an array - @dataarray. I used a hash to get the count for each category. Colum 3 of the array holds the error code. The below is a simple code showing how I get the count for each category.

my %error = (
    'disk' => 0,
    'cpu' => 0,
    'hdd' => 0,
    'graphics' => 0,
    'keyboard' => 0,
    '' => 0 #no error code
    );

for my $i (0 .. $#dataarray) {
    $error{$dataarray[$i][3]}++;
}
[download]

Column 0 and column 1 holds the computer location and name (respectively). They are unique only as a pair. Can any experts advice how I can get the top x matches for each category? In addition, I also need the error occurance count for each of the top x computer location/name. Thanks. Nefertiti

Comment on Complex sorting and counting question. Download Code

Replies are listed 'Best First'.
Re: Complex sorting and counting question. by Corion (Patriarch) on Sep 18, 2008 at 14:40 UTC
What part of the answer you got in Revisiting the "Getting top x matches" thread do you have problems with? Maybe you can show us some more code and detail where exactly your code returns the wrong results. Also see Counting events and getting top 5 matches, which treats a problem not entirely dissimilar to your problem.	[reply]
Re^2: Complex sorting and counting question. by nefertiti (Initiate) on Sep 19, 2008 at 02:19 UTC
Hi Monks, Below is the code provided by someone. What I do'nt get is the `%code = map` onwards. I'm not sure what's he's trying to get. I have difficulty understanding this part. Thanks for the help. my $query = $db->prepare("select code, username, userid from transacti +ons"); $query->execute(); my %code = map { $_ => [] } 0 .. 13; while (my $r = $qeury->fetchrow_hashref()) { my $c = $$r{code}; push @{$code{$c}}, $r; } # For each code $c, $code{$c} is now an arrayref which # contains the relevant records (as hashrefs). The # number of elements in the array equals the number # of occurances. Now, for the second part... my %topfive; for my $c (keys %code) { my %user; for my $row (@{$code{$c}}) { # Note: this assumes that the userid field # is unique to each user. $user{$$row{userid}}{count}++; $user{$$row{userid}}{name} = $$row{username}; } $topfive{$c} = [(sort { $$b[2] <=> $$a[2] } map { [$_, $user{$_}{name}, $user{$_}{count}] } keys %user)[0 .. 4]]; } [download]	[reply] [d/l] [select]
Re^3: Complex sorting and counting question. by Corion (Patriarch) on Sep 19, 2008 at 06:34 UTC
Maybe you should ask the monk who wrote it directly then? That's why we have threaded replies here. Anyway: This prepares and executes an SQL statement, see DBI: `my $query = $db->prepare("select code, username, userid from transacti +ons"); $query->execute();` [download] This initializes the `%code` hash with the keys from 0 to 13. See perlop about the range operator, map and then Data::Dumper for how to inspect data structures. `my %code = map { $_ => [] } 0 .. 13;` [download] This line contains a typo that `use strict;` will catch for you. See strict and likely then Coping with Scoping. `while (my $r = $qeury->fetchrow_hashref()) {` [download] This loop overall fetches each retrieved hash and puts it into the corresponding array in `%code`: `my $c = $$r{code}; push @{$code{$c}}, $r; } # For each code $c, $code{$c} is now an arrayref which # contains the relevant records (as hashrefs). The # number of elements in the array equals the number # of occurances. Now, for the second part...` [download] This code determines the top five userids. `my %topfive; for my $c (keys %code) { my %user; for my $row (@{$code{$c}}) { # Note: this assumes that the userid field # is unique to each user. $user{$$row{userid}}{count}++; $user{$$row{userid}}{name} = $$row{username}; } $topfive{$c} = [(sort { $$b[2] <=> $$a[2] } map { [$_, $user{$_}{name}, $user{$_}{count}] } keys %user)[0 .. 4]]; }` [download] I already recommended pushing more work back into the database instead of manually sorting and counting.	[reply] [d/l] [select]