soon_j has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

I need enlightenment on a simple issue: What's an efficient way of retrieving a Rank # from a MySQL table of several hundred/thousand entries?

For example, I have a MySQL Table containing the data of every student including the number of classes they attended, their total scores, grades, and etc. The data from the table could easily be manipulated... like if I choose to display in a HTML Table the top100 students in the class, I'll just have the returns of the query arranged in an "ASC"ending manner, then determine their ranks.

My piece of concern is that if I retrieve the records for a single student... how will I determine its rank without retrieving the entire content of the table and sorting them or is this possible?

From my limited idea, I say this could be possible only if I retrieve all entries in the table and sort them, then from the array of the sorted entries, retrieve the rank and other data of the student. But what if the table becomes so large... speed might be a problem?

Please share your ideas... thanks!

Jay S.
  • Comment on What's an Efficient Way to Retrieve a Rank from a MySQL Table?

Replies are listed 'Best First'.
Re: What's an Efficient Way to Retrieve a Rank from a MySQL Table?
by Abigail-II (Bishop) on May 04, 2004 at 09:24 UTC
    Pseudo SQL code:
    SELECT 1 + COUNT(*) FROM student_data WHERE grade > (SELECT grade FROM student_data WHERE name = "whatever")
    The principle being that someones rank is one more than the number of people that are better. I don't think MySQL supports nested SELECT statements, but you can always split the query into two. If you have an index on the 'grade' column (for simplicity, I assume such a column exists), the query should be pretty fast as the database doesn't need to do a table scan.

    Abigail

      I went through your suggestion and here are my comments:

      Assuming we have the following data:

      student - score
      a - 70
      b - 70
      c - 70
      d - 60
      e - 50

      problem: we need to determine the rank for student d

      From your above suggestion, the result of your query would be "4" (meaning you are the fourth in the list). However since students a, b, and c have same grades, they should all be ranked as "first" while student "d" should be ranked as "second".

      However, your suggestion is very fast and extremely useful if we present the result this way ... "There are 4 students that have higher scores!".

      Would the "SELECT" command return the number of rows returned? If so, we can slightly modify your above suggestion to return results GROUPed by the scores. If the scores are grouped, the modified query would return only 1 row since only the score of 70 is greater than 60. If rank is one more than the number of results that are better, then the rank is correctly reported as "2".
        Ranking student 'd' as second is very unrealistic. It would mean that if student 'a' (who is better than 'd') increases his score to 70.000001, student 'd' drops a rank. That would be counterintuitive. Or, if you are a sports fan, if these scores would be sports scores, 'd' would not get a silver medal.

        Having said that, look into the 'group by' clause. What you want is determine the rank of the group the student falls in.

        Abigail

        It actually depends whether you want to rank a) the students or b) their scores:
        a) student d ist 4th
        b) student d has 2nd best score

        pelagic
      Thank you very much... I see no reason why this shouldn't work... except that MySQL 4.0.18 may not support nested SELECT's (I'm not sure though)... but 4.1.xx may support this. Anyway I can break the queries as you suggested.
Re: What's an Efficient Way to Retrieve a Rank from a MySQL Table?
by Ryszard (Priest) on May 04, 2004 at 08:48 UTC
    If the rank of a student is relative to the population you cant determine the rank on the fly without looking at the population.

    My idea would be to run a job periodically that would do that for you, compute the value and store it statically in a table.

    Alternatively you could write an internal function that did that for you eg: select rank(student) from students.. and the rank function would be something you write yourself. not sure if MySQL supports custom functions in perl, (and i'm too lazy to look it up) but postgres sure does.. (not that i'm suggesting you switch RBDMS's) :-).

      I believe you are right. The rank is highly dependent on the entire population. Since students are taking exams online at indefinite times, their performances vary at a non-predictable manner. To determine a specific rank, I need to take a snapshot from the entire table.

      Previously, I thought of having another table that only holds information such as the rank of the student. The problem with this idea is that everytime a student improves in rank (like from 9th place moving up to 5th place), I need to update the ranks of all entries respectively.

      Thanks for the comments
Re: What's an Efficient Way to Retrieve a Rank from a MySQL Table?
by Anonymous Monk on May 04, 2004 at 09:28 UTC

    Well, if you need constant up-to-date ranking, just don't keep the whole database contents in memory at once. At first, loop through the rows only selecting the columns needed for ranking. Then do your ranking, and then loop through the database a second time, pulling out the rows in the correct order determined by the ranking. Something like the following perhaps?

    my $dbh = DBI->connect( ... ); my $sth1 = $dbh->prepare( 'SELECT studentid, ranker1, ranker2 FROM students' ); $sth1->execute(); my %ranked; while ( my $row = $sth1->fetchrow_arrayref() ) { my $calc_rank = $row->{ranker1} + $row->{ranker2}; %ranked{ $row->{studentid} } = $calc_rank; } my @rank_order = sort { $ranked{$a} <=> $ranked{$b} } keys( %ranked ); my $sth2 = $dbh->prepare( 'SELECT * FROM students WHERE studentid=?' ); for my $studentid ( @rank_order ) { $sth2->execute( $studentid ); my $row = $sth2->fetchrow_hashref(); while ( my($k, $v) = each( %$row ) ) { print( "$k: $v\n" ); } print( "\n\n" ); }
      # oops... while ( my $row = $sth1->fetchrow_arrayref() ) { my $calc_rank = $row->{ranker1} + $row->{ranker2}; %ranked{ $row->{studentid} } = $calc_rank; } # should actually read while ( my $row = $sth1->fetchrow_arrayref() ) { my $calc_rank = $row->[1] + $row->[2]; %ranked{ $row->[0] } = $calc_rank; }
      Thanks. I'll give this a try as well as the other suggestions.