Sorting Votes, Confidence & Deviations

saintly has asked for the wisdom of the Perl Monks concerning the following question:

Blessed Monks, I beseech thee to shed thy collective illumination on the dark twisting depths of this thorny problem:

I've encountered this same problem different times in different forms. Essentially, I have a list of items that have been voted on by varying numbers of people. I want to sort the list from 'best' to 'worst'.

The part where it gets tricky is when I have different numbers of votes for different items. Sorting by 'Average Rating', an item with only two votes, both '5/5' will be ranked higher than an item with 1000 votes, 999 of which are 5/5 and 1 of which is anything else. Perhaps taking that statistics class would have helped me out here, since I imagine that statisticians run into this sort of thing all the time. In any case, I've been using the following methods to tackle the problem:

Sorting Method	Advantages	Problems
Method1: Average	Very simple. Requires no processing if this is all the data you have. Minor variations: Sort items with same average by #votes Mark items with 'high' average + lots of votes with a Super Gold Star (or something) Report the standard deviation (aka 'confidence', or 'this poll has a margin of error of +/- 2 points')	Items with few votes have fluctuating averages, can be sorted higher than items with lots of votes. After a few dozen votes, almost no items will have a '5/5' perfect score. By using decimals you make the score more confusing; is a 4.93 really better than a 4.92? Offloads the real sorting and determining of value onto the user. Hmm... Should I go with the book rated 4.5 +/- 2.0 with 1000 votes, or the one with 4.9 +/- 4.0 with 3 votes?
Method2: Median	Sorts very well if you have lots of votes. 900 '5's and 100 of anything else becomes a '5'. 9000 '5's and 1000 of anything else also becomes a '5'. This should be fine since you have a lot of confidence in both scores.	Still doesn't solve problem of low-confidence items (with few votes) being placed at a higher level than they should be. Requires knowledge of all votes cast, which is not always available.
Method3: Use average, but eliminate entries with few votes.	You're confident about every item in the list Everything else is sorted where it should be	Items with few votes aren't in the main list and may have a hard time getting more votes. They can be put in a special list of 'brand new' or 'promising' items perhaps.
Method4: 'Magic' Score	Since you can implement any complexity or special-case you desire, a scoring system should work... it's coming up with one that's hard. Coming up with an elegant one that doesn't use 'magic numbers' is even harder.	At some point (depending on the size of your total data set), you're reasonably confident that votes are approaching a 'final' score. After a certain point you don't want (Average '4.9', 2000 votes) sorted below (Average '4.6', 5000 votes). Or do you? Coming up with the 'value' for a given vote, say (1,2,3,4,5) have the values (-4,-1,0,1,4) seems 'magic' sometimes. How can a system determine it's own magic values?

I'm assuming that in all cases, your program will have access to the following data:

A list of all the items
For each item, the #votes cast
For each item, the average score

In some cases, you may also have access to all the votes (500 '1's, 200 '2's, etc...).

For a given system, the votes are all numeric, although some systems use different scales. The Likert (1-5) scale is common. Sometimes it's 1-7 (1-2 are bad, 3-5 are average, 6-7 are good).

I'd like to see something like this:

1000 votes: all 1000 are 5/5
100 votes: 50 '5's, 50 '4's
10 votes: 9 '5's, 1 '1'
3 votes: all '5's
10 votes: all '2's
100 votes: 50 '2's, 50 '1's
3 votes: all '1's (or possibly switched with above line)
1000 votes: all 1000 are 1/5 (lowest)

The best I can express my ideal sorting scheme is:

Items with lots of votes are 'high confidence'. Items with few votes are 'low confidence'.
What constitutes 'lots' and 'few' isn't a set number, but relative for your total set of data.
I'm not sure what a good way to determine 'lots' or 'few' is, but I'm currently using something like 1/4 * (average #votes cast per item) as my 'confidence' score. Anything with less than that is 'very low confidence', 1-2 times that amount is 'low' and everything else is 'high'.
The lower the confidence in the score, the more the item should be sorted near the middle, if included in the overall list at all.

If you handle a voting system, what method(s) do you use to rank items?

Here's a sample program that sorts the scenario above, but probably fails when the vote totals are different:

#!/usr/bin/perl

use strict;
use List::Util qw/sum/;

my @votes = (
  { item => 'worstest',   votes => [ 1000,  0, 0,  0,    0 ] },
  { item => 'promising',  votes => [    1,  0, 0,  0,    9 ] },
  { item => 'bombing',    votes => [    0, 10, 0,  0,    0 ] },
  { item => 'lookinGood', votes => [    0,  0, 0,  0,    3 ] },
  { item => 'bad',        votes => [   50, 50, 0,  0,    0 ] },
  { item => 'good',       votes => [    0,  0, 0, 50,   50 ] },
  { item => 'iffy',       votes => [    3,  0, 0,  0,    0 ] },
  { item => 'bestest',    votes => [    0,  0, 0,  0, 1000 ] },
);

# Ideal sort:
#   bestest, good, promising, lookinGood, bombing, bad, iffy, worstest
# or
#   bestest, good, promising, lookinGood, bombing, iffy, bad, worstest


# These global variables may be useful to your sorting function.
our $TOTAL_VOTES_CAST = sum( map { sum( @{$_->{'votes'}} ) } @votes );
our $TOTAL_ITEMS      = @votes;

print "Best to worst: ";
foreach my $vote ( sort sortingMethod @votes ) {
  print $vote->{'item'}, " : ", &score( @{$vote->{'votes'}} ), "\n";
}

print &validateSortingMethod(@votes) ? "Awesome Power!\n" : "Teh Suck!
+!\n";
exit;


# Gets A & B and uses the scoring system to rank them
sub sortingMethod {
  my $score_a = &score( @{ $a->{'votes'} } );
  my $score_b = &score( @{ $b->{'votes'} } );
  return $score_b <=> $score_a; # Sort highest-to-lowest
}

# Expects a list like:
#   $list[0] == number of '0' votes
#   $list[1] == number of '1' votes, etc...
sub score {
  my( @list ) = @_;
  my @votes = &expandList( @list );

  # Use 'magic values' scoring method
  @votes = map { [-4,-1,0,1,4]->[$_] } @votes;

  # This function doesn't use all these, but you might...
  my $average   = &average( \@votes );
  my $median    = &median( \@votes );
  my $stdDev    = &stdDev( \@votes );
  my $votesCast = @votes;


  return $average * $votesCast;
}

sub average { my $l_ref = shift; return (sum( @$l_ref ) / scalar( @$l_
+ref )); }
sub median {
  my $list_ref = shift;
  my $voteCount = @$list_ref;
  if( $voteCount % 2 ) {
    return $list_ref->[ ($voteCount+1) / 2 ]; # Odd # of votes, pick m
+iddle
  } else {
    return sum ($list_ref->[($voteCount/2)], $list_ref->[($voteCount/2
+)+1]) / 2;
  }
}
sub stdDev {
  my ($list_ref) = @_;
  my $squares = 0;
  foreach my $element ( @$list_ref ) {
    $squares += $element ** 2;
  }
  return sqrt( ($squares/scalar(@$list_ref)) - (&average($list_ref)**2
+) );
}
sub expandList {
  my(@list) = @_;
  my @expanded = ();
  for( my $thisValue = 0; $thisValue < @list; $thisValue++ ) {
    my $voteCount = $list[ $thisValue ];
    push( @expanded, split(/,/, "$thisValue," x $voteCount) );
  }
  return @expanded;
}

sub validateSortingMethod {
  my @sorted = sort sortingMethod @_;
  my $nm = join(",", map { $_->{'item'} } @sorted);
  #print "V [$nm]\n";
  if( ($nm eq "bestest,good,promising,lookinGood,bombing,bad,iffy,wors
+test")||
      ($nm eq "bestest,good,promising,lookinGood,bombing,iffy,bad,wors
+test")
    ) {
    return 1;
  } else { return 0; }
}
[download]

Feel free to also criticize my code. Statistics::Basic implements StdDev, Mean & Median for you, and I expect to use it in the final scoring system(s). I super-searched for titles with 'vote'/'voting' and 'score'/'scoring', but found no hits. If this has already been discussed, I'd appreciate knowing what search terms you used to find the old discussion.

Comment on Sorting Votes, Confidence & Deviations Download Code

Replies are listed 'Best First'.
Re: Sorting Votes, Confidence & Deviations by GrandFather (Saint) on Apr 12, 2007 at 20:41 UTC
The bottom line issue is that you want to show two orthogonal metrics with one number - interest (the number of votes) and rating (the pattern of votes). So the problem becomes "How do I map two parameters to generate a single figure of merit" which is pretty much what you have provided some methods for. Bearing in mind though that you have two orthogonal parameters, a different way to show the vote result is as a table where rows represent (some quantisation of) ranking and columns represent (some quantisation of) interest. Method3 comes close to this concept. DWIM is Perl's answer to Gödel	[reply]
Re: Sorting Votes, Confidence & Deviations by billisdog (Sexton) on Apr 12, 2007 at 21:05 UTC
I think you will have the most luck redefining your question. I'll propose a different measure and then suggest a way to find it here: Basically, you think you are looking for the highest rated item, but here's another way of phrasing this that is more useful- you want to display the item that you believe has the highest percentage chance of being one a user will like. Why don't you use statistics standardly accepted "margin of error" or "confidence" formulas to equalize this out? Here's my proposed algorithm: 1. Assign everything a percentage score (based on pure division. 5/5 with 1000 votes is 100%, 5/5 with 2 votes is 100%, 3/5 with 1000 votes is 60%, etc). 2. Based on the sample size (number of voters) as opposed to your total user base (number of registered users), find the margin of error using the standard formula found in this gif: http://upload.wikimedia.org/math/7/3/a/73a9acf851e5c9fe3cdfcf52a1612cd0.png 3. Penalize every score by decrementing it by the largest assumable negative margin of error (better safe than sorry). This will all work out very nicely due to the laws of statistics. The 5/5 with 100 votes will fall to something like 98%, whereas the 5/5 with only 2 votes will have an absurd margin of error (since 2 tells you almost nothing) and fall to something like 40%, thereby losing out to the only minorly decremented 3/5 with 100 votes, which might fall to something like 58%. What does anybody else think of this algorithm?	[reply]
Re^2: Sorting Votes, Confidence & Deviations by billisdog (Sexton) on Apr 12, 2007 at 21:08 UTC
Just adding this on- the reason I like the above solution so much is that it penalizes low confidence entries, but not with any kind of magic number- it does with it perhaps the most time tested and mathematically backed up method of penalizing low confidence entries ever!	[reply]
Re^2: Sorting Votes, Confidence & Deviations by Anonymous Monk on Apr 13, 2007 at 14:40 UTC
While I agree, with the use of Margin of Error/Confidence values, I think your approach is overly pessimistic. Take for example an entry that gets a single minimum vote (1 or 0) out of 5. This is just as likely to be deceptive and your approach would, incorrectly I believe, drive this value farther below minimum. As this problem is most noticeable only at/near 100% & 0%, I suggest another approach. Generate 2 values, Score + Error and Score - Error. Trim these values to the range in question, and average the Trimmed values for a final score. This would have the effect of moving scores that are pegged at the end of the range inward by Half their possible error. Scores near the center of the range will be left largely unaffected.	[reply]
Re^3: Sorting Votes, Confidence & Deviations by Anonymous Monk on Apr 13, 2007 at 19:35 UTC
I agree that the approach is probably too pessimistic. It essentially takes the view that all uncertainty is downside. I would be reluctant to fudge the calculation of the mean. Which aspect of the data has more "information" to it? The mean or the confidence? That should guide you about how to combine the two. To my eye the least invasive treatment would be to compute the means and the standard errors, and then use the standard error to break ties when sorting by the means. Obviously you need the actual values in the population for this (to compute the standard deviation). For the other ones, maybe you can stick in a "default" standard error (don't know what you're scoring, so who knows).	[reply]
Re: Sorting Votes, Confidence & Deviations by NiJo (Friar) on Apr 12, 2007 at 20:46 UTC
You have two different problems to solve before coding: 1) How to express uncertainty/controversiality? A common approach is relative standard deviation (mean/standard deviation). Or median/quartil. I'd not care too much about the differences. 2) How to combine average with uncertainty? That depends on your application. Does it rate a book that people love or hate and discuss about? You should probably read it, too. Or is it a tool that performs badly in some situations but is good enough for the average job? If you can't anticipate your usage, you should stay away from it. Keeping the two dimensions seperated (and plot it?) leaves that decision to the user. In the classical vendor selection he faces a multidimensional problem anyway, e. g. on price. Or you could report and sort on a different point of the distribution curve. Using average - 2 * sigma roughly means "90% of votes are better than X" (if you assume gaussion distribution)	[reply]
Re: Sorting Votes, Confidence & Deviations by aijin (Monk) on Apr 13, 2007 at 21:06 UTC
As someone who frequents many websites with some variation of a rating system, I thought I'd chime in from the user perspective. The type of "best of" lists that I like are the ones that give me not just a ranked list (ie: the top 10 whatsits) but the ratings and some indication of the number of votes. That helps me, as a user, to decide how accurate the rating is. I'm a fan of the practice of displaying the rating in a different way at different confidence levels (ie: bigger, bolder, different colors) to convey the data. Think: 'tag cloud'. You list "Offloads the real sorting and determining of value onto the user" as a problem with the first scheme. I can see that it could be if you go with a standard deviation as the indicator of the number of votes. Most people wouldn't have the first idea of what to do with that information. I think the average vote, along with the number of votes (or some graphical indication thereof) and an option to see the breakdown of votes is a good system. That way the users can see, in the case of an item with a small number of votes, if a single vote has skewed the average. That's enough blathering on from me....I hope it at least gave you some ideas. Please post again and let us all know what you decide on in the end. I'm very curious! Cheers!	[reply]
Re: Sorting Votes, Confidence & Deviations by punch_card_don (Curate) on Apr 13, 2007 at 15:04 UTC
Essentially, I have a list of items that have been voted on by varying numbers of people. I want to sort the list from 'best' to 'worst'. Actually, what you have is as much a business problem as a programming problem. This is a value call. Your company is collecting this info for a reason - so what weight to they put number of votes versus consistency of votes? How do these numbers relate to their business, or what they're trying to find out? Once that's defined, then the progrmaming can be done to reflect that. For the moment, it sounds like you're going at it the other way around, letting programming define the evaluating criteria of the company. If you're the programmer, I'd be taking this question to those who will be interpreting the data and asking them their priorities. Not because you're lost and want to be told what to do - because you want to ensure that the data anaylisis and representation as accurately as possible reflect the business decision criteria of management. Make you look like the thinking programmer of the bunch to management too. Forget that fear of gravity, Get a little savagery in your life.	[reply]
Re: Sorting Votes, Confidence & Deviations by snoopy (Curate) on Apr 18, 2007 at 05:25 UTC
Another thought is to analyse the data using weighted upvotes and downvotes. For example for each vote vote cast, tally: 5/5 => 2 votes, 4/5 => 1 vote, 3/5 => 0 votes (neutral), 2/5 => -1 votes, 1/5 => -2 votes. Then for each item: popular items get a high positive score unpopular items get a negative score contentious items have high variance the number of votes indicates the level of interest	[reply]