my @constraints = ( {mid => 20, sd => 15}, {mid => 30, sd => 25}, {mid => 50, sd => 10}, );
As others have implied, I don't think you're going to be able to do this deterministically, as you have over-specified your solution set. The sum of your maximums is greater than 100 and the sum of your minimums is less than 100.
What comes to mind to minimize the number of discarded sets and avoid scaling is to pick the numbers in descending order of midpoint. After generating each number, check if the remainder is more than the sum of the minimums remaining and less than the sum of the maximums remaining. If it's outside the bounds, repick the last number. When you're down to the last number, you don't generate it randomly, it's just the remainder.
Basically, you're pruning off choices that can't satisfy the remaining constraints.
Without thinking it through further, I'm worried that doing it in descending order of midpoint might bias the results. I think you could pick them in random order and you're just more likely to have to repick numbers along the way.
Update: Here's a code sample:
use strict; use warnings; sub RandFlat { #Return a rand () value with a flat distribution about the $mean + +- $stdDev my ($mean, $stdDev) = @_; my $range = 2.0 * $stdDev; my $value = rand ($range); return $value + ($mean-$stdDev); } sub generate_set { my ($remainder, @constraints) = @_; my @results; my ($sum_of_minima, $sum_of_maxima); for my $c ( @constraints ) { $sum_of_minima += $c->{mid} - $c->{sd}; $sum_of_maxima += $c->{mid} + $c->{sd}; } # iterate through N-1 constraints in descending order my @descending = sort { $b->{mid} <=> $a->{mid} } @constraints; my $last_value = pop @descending; for my $c ( @descending ) { my $n = RandFlat( $c->{mid}, $c->{sd} ); # repeat if remainder outside sum of the remaining # minima and maxima contraints my $new_remainder = $remainder - $n; my $new_sum_of_minima = $sum_of_minima - ( $c->{mid} - $c->{sd +} ); my $new_sum_of_maxima = $sum_of_maxima - ( $c->{mid} + $c->{sd +} ); redo if ( $new_remainder < $new_sum_of_minima) || ( $new_remainder > $new_sum_of_maxima); # otherwise save number and update the remainder and constrain +ts push @results, [ $c->{mid}, $c->{sd}, $n ]; $remainder = $new_remainder; $sum_of_minima = $new_sum_of_minima; $sum_of_maxima = $new_sum_of_maxima; } # the remainder must now satisfy the final constraint return @results, [ $last_value->{mid}, $last_value->{sd}, $remaind +er ]; } my $total = 100; my @constraints = ( {mid => 20, sd => 15}, {mid => 30, sd => 25}, {mid => 50, sd => 10}, ); for ( 1 .. 5 ) { for my $result ( generate_set($total, @constraints) ) { my ($mid, $sd, $value) = @$result; printf "%5.1f +-%5.1f: %5.1f", $mid, $sd, $value; print " - bad" if ($value < ($mid - $sd)) || ($value > ($mid + + $sd)); print "\n"; } print "\n"; }
I ran it 1000 times and didn't see any bad results.
Also, thinking it through again, I don't think the descending order will wind up biased (but I could be convinced by a good argument).
-xdg
Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.
In reply to Re: Need technique for generating constrained random data sets
by xdg
in thread Need technique for generating constrained random data sets
by GrandFather
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |