comment on

Blessed Monks, I beseech thee to shed thy collective illumination on the dark twisting depths of this thorny problem:

I've encountered this same problem different times in different forms. Essentially, I have a list of items that have been voted on by varying numbers of people. I want to sort the list from 'best' to 'worst'.

The part where it gets tricky is when I have different numbers of votes for different items. Sorting by 'Average Rating', an item with only two votes, both '5/5' will be ranked higher than an item with 1000 votes, 999 of which are 5/5 and 1 of which is anything else. Perhaps taking that statistics class would have helped me out here, since I imagine that statisticians run into this sort of thing all the time. In any case, I've been using the following methods to tackle the problem:

Sorting Method	Advantages	Problems
Method1: Average	Very simple. Requires no processing if this is all the data you have. Minor variations: Sort items with same average by #votes Mark items with 'high' average + lots of votes with a Super Gold Star (or something) Report the standard deviation (aka 'confidence', or 'this poll has a margin of error of +/- 2 points')	Items with few votes have fluctuating averages, can be sorted higher than items with lots of votes. After a few dozen votes, almost no items will have a '5/5' perfect score. By using decimals you make the score more confusing; is a 4.93 really better than a 4.92? Offloads the real sorting and determining of value onto the user. Hmm... Should I go with the book rated 4.5 +/- 2.0 with 1000 votes, or the one with 4.9 +/- 4.0 with 3 votes?
Method2: Median	Sorts very well if you have lots of votes. 900 '5's and 100 of anything else becomes a '5'. 9000 '5's and 1000 of anything else also becomes a '5'. This should be fine since you have a lot of confidence in both scores.	Still doesn't solve problem of low-confidence items (with few votes) being placed at a higher level than they should be. Requires knowledge of all votes cast, which is not always available.
Method3: Use average, but eliminate entries with few votes.	You're confident about every item in the list Everything else is sorted where it should be	Items with few votes aren't in the main list and may have a hard time getting more votes. They can be put in a special list of 'brand new' or 'promising' items perhaps.
Method4: 'Magic' Score	Since you can implement any complexity or special-case you desire, a scoring system should work... it's coming up with one that's hard. Coming up with an elegant one that doesn't use 'magic numbers' is even harder.	At some point (depending on the size of your total data set), you're reasonably confident that votes are approaching a 'final' score. After a certain point you don't want (Average '4.9', 2000 votes) sorted below (Average '4.6', 5000 votes). Or do you? Coming up with the 'value' for a given vote, say (1,2,3,4,5) have the values (-4,-1,0,1,4) seems 'magic' sometimes. How can a system determine it's own magic values?

I'm assuming that in all cases, your program will have access to the following data:

A list of all the items
For each item, the #votes cast
For each item, the average score

In some cases, you may also have access to all the votes (500 '1's, 200 '2's, etc...).

For a given system, the votes are all numeric, although some systems use different scales. The Likert (1-5) scale is common. Sometimes it's 1-7 (1-2 are bad, 3-5 are average, 6-7 are good).

I'd like to see something like this:

1000 votes: all 1000 are 5/5
100 votes: 50 '5's, 50 '4's
10 votes: 9 '5's, 1 '1'
3 votes: all '5's
10 votes: all '2's
100 votes: 50 '2's, 50 '1's
3 votes: all '1's (or possibly switched with above line)
1000 votes: all 1000 are 1/5 (lowest)

The best I can express my ideal sorting scheme is:

Items with lots of votes are 'high confidence'. Items with few votes are 'low confidence'.
What constitutes 'lots' and 'few' isn't a set number, but relative for your total set of data.
I'm not sure what a good way to determine 'lots' or 'few' is, but I'm currently using something like 1/4 * (average #votes cast per item) as my 'confidence' score. Anything with less than that is 'very low confidence', 1-2 times that amount is 'low' and everything else is 'high'.
The lower the confidence in the score, the more the item should be sorted near the middle, if included in the overall list at all.

If you handle a voting system, what method(s) do you use to rank items?

Here's a sample program that sorts the scenario above, but probably fails when the vote totals are different:

#!/usr/bin/perl

use strict;
use List::Util qw/sum/;

my @votes = (
  { item => 'worstest',   votes => [ 1000,  0, 0,  0,    0 ] },
  { item => 'promising',  votes => [    1,  0, 0,  0,    9 ] },
  { item => 'bombing',    votes => [    0, 10, 0,  0,    0 ] },
  { item => 'lookinGood', votes => [    0,  0, 0,  0,    3 ] },
  { item => 'bad',        votes => [   50, 50, 0,  0,    0 ] },
  { item => 'good',       votes => [    0,  0, 0, 50,   50 ] },
  { item => 'iffy',       votes => [    3,  0, 0,  0,    0 ] },
  { item => 'bestest',    votes => [    0,  0, 0,  0, 1000 ] },
);

# Ideal sort:
#   bestest, good, promising, lookinGood, bombing, bad, iffy, worstest
# or
#   bestest, good, promising, lookinGood, bombing, iffy, bad, worstest


# These global variables may be useful to your sorting function.
our $TOTAL_VOTES_CAST = sum( map { sum( @{$_->{'votes'}} ) } @votes );
our $TOTAL_ITEMS      = @votes;

print "Best to worst: ";
foreach my $vote ( sort sortingMethod @votes ) {
  print $vote->{'item'}, " : ", &score( @{$vote->{'votes'}} ), "\n";
}

print &validateSortingMethod(@votes) ? "Awesome Power!\n" : "Teh Suck!
+!\n";
exit;


# Gets A & B and uses the scoring system to rank them
sub sortingMethod {
  my $score_a = &score( @{ $a->{'votes'} } );
  my $score_b = &score( @{ $b->{'votes'} } );
  return $score_b <=> $score_a; # Sort highest-to-lowest
}

# Expects a list like:
#   $list[0] == number of '0' votes
#   $list[1] == number of '1' votes, etc...
sub score {
  my( @list ) = @_;
  my @votes = &expandList( @list );

  # Use 'magic values' scoring method
  @votes = map { [-4,-1,0,1,4]->[$_] } @votes;

  # This function doesn't use all these, but you might...
  my $average   = &average( \@votes );
  my $median    = &median( \@votes );
  my $stdDev    = &stdDev( \@votes );
  my $votesCast = @votes;


  return $average * $votesCast;
}

sub average { my $l_ref = shift; return (sum( @$l_ref ) / scalar( @$l_
+ref )); }
sub median {
  my $list_ref = shift;
  my $voteCount = @$list_ref;
  if( $voteCount % 2 ) {
    return $list_ref->[ ($voteCount+1) / 2 ]; # Odd # of votes, pick m
+iddle
  } else {
    return sum ($list_ref->[($voteCount/2)], $list_ref->[($voteCount/2
+)+1]) / 2;
  }
}
sub stdDev {
  my ($list_ref) = @_;
  my $squares = 0;
  foreach my $element ( @$list_ref ) {
    $squares += $element ** 2;
  }
  return sqrt( ($squares/scalar(@$list_ref)) - (&average($list_ref)**2
+) );
}
sub expandList {
  my(@list) = @_;
  my @expanded = ();
  for( my $thisValue = 0; $thisValue < @list; $thisValue++ ) {
    my $voteCount = $list[ $thisValue ];
    push( @expanded, split(/,/, "$thisValue," x $voteCount) );
  }
  return @expanded;
}

sub validateSortingMethod {
  my @sorted = sort sortingMethod @_;
  my $nm = join(",", map { $_->{'item'} } @sorted);
  #print "V [$nm]\n";
  if( ($nm eq "bestest,good,promising,lookinGood,bombing,bad,iffy,wors
+test")||
      ($nm eq "bestest,good,promising,lookinGood,bombing,iffy,bad,wors
+test")
    ) {
    return 1;
  } else { return 0; }
}
[download]

Feel free to also criticize my code. Statistics::Basic implements StdDev, Mean & Median for you, and I expect to use it in the final scoring system(s). I super-searched for titles with 'vote'/'voting' and 'score'/'scoring', but found no hits. If this has already been discussed, I'd appreciate knowing what search terms you used to find the old discussion.

In reply to Sorting Votes, Confidence & Deviations by saintly

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.