in reply to accurately rounding numbers for percentages
ysth's node above has a link to a nice essay on fudging numbers so that they round up to 100. Apparently in the author's company, they fudge the numbers to add up to 100 so that the help desk isn't inundated with complaints about "mistakes" in the reports the publish. So there may be some situations where, reality aside, one may really need to make those numbers add up to 100!
The question then becomes how to do this so that one minimizes mistaken impressions. One's choice will depend a great deal on how one expects people to view the numbers. If one thinks that readers are making judgements based on absolute percentages then you will want to add your fudge factor to the largest numbers. Adding 1 to 1% doubles it whereas adding 1 to 98% is rather insignificant.
However, percentages are relative measures by nature. Thus one might also assume that readers are making judgements based on relative percentages more than absolute percentages. In that case, one might argue that fudge factors should be randomly to the percentages to avoid bias. I don't know which is best. I found several articles on subjective perceptions of statistics via google, but most of them were from paid collections and would have required a trip to the university library. Unfortunately, I didn't have the time to look them up.
The article ysth linked to also had a nice sample of test data, so I decided to work up the case of random assignment of fudge factors along with a test suite based on Test::More.
The test suite is wrapped in a subroutine, runTests to make it easier to test alternative algorithms. If you would like to try your own alternate algorithm against the test suite, pass a code reference. Alternate fudging routines should accept two parameters: ($precision, $aHistogram). $precision is the number of decimal digits in your total. For example, if $precision == 2 then your percentages must add up to 100.00. $aHistograph is a histogram whose numbers can add up to anything. The fudging subroutine is responsible for converting them to percentages.
use strict; use warnings; #--------------------------------------------------------- # Fudge routine using random assignment of fudge factors #--------------------------------------------------------- sub fudge { my ($iPrecision,$aHistogram) = @_; # get percentage of whole # but scale up percentages so that displayed digits are # all on left of decimal point my $sum=0; $sum+=$_ for (@$aHistogram); my $scale=10**$iPrecision; my $target=100*$scale; my @aScaled=map {int($target*$_/$sum) } @$aHistogram; $sum=0; $sum+=$_ for (@aScaled); # calculate +/-1 fudge factor for each term my $signum=$sum<$target?1:-1; my $err=abs($target-$sum); my @aFudge; my $count=scalar(@aScaled); for (1..$err) { my $i=rand $count; while (defined($aFudge[$i])) { $i=rand $count; } $aFudge[$i]=$signum; } # randomly apply +/-1 fudge factor to terms # no more than one per term # print STDERR "input: <$iPrecision> <$sum> <$err> <$signum>:<@aScal +ed>\n"; for (0..$#aScaled) { my $iFudge=$aFudge[$_]; $aScaled[$_]+=$iFudge if defined($iFudge); $aScaled[$_]/=$scale; } # print STDERR "output:<@aScaled>\n"; return @aScaled; } #--------------------------------------------------------- # Test suite #--------------------------------------------------------- use Test::More tests => 29; sub runTests { my $crFudge=shift; while (my $line=<DATA>) { chomp $line; my @aFlds=split(/\s+/,$line); my $iPrecision=shift @aFlds; my $sum=0; $sum+=$_ foreach $crFudge->($iPrecision,\@aFlds); my $sFormat="%.${iPrecision}f"; is(sprintf($sFormat,$sum), sprintf($sFormat,100) ,"<$iPrecision> <@aFlds>"); } } runTests(\&fudge); #--------------------------------------------------------- # Test data #--------------------------------------------------------- # col1: precision, e.g. 2 means all percentages must add up to 100.00 __DATA__ 0 100 100 100 1 100 100 100 2 100 100 100 0 7 7 7 1 7 7 7 2 7 7 7 1 30 30 10 1 73 1 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1 3 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1 3 2300 3400 4500 5600 6799 2 20 0 10 1 50 0 1 4 7 0 7 7 3 3 1 0 10 10 30 1 10 10 30 2 10 10 30 0 10 20 30 30 1 10 20 30 30 2 10 20 30 30 0 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 42 0 2 2 1 1 1 1 0 2 1 1 1 1 10 1 1 1 0 3570 262 3721 1498 3270 3585 3920 2740 306
Best, beth
|
|---|