in reply to Need technique for generating constrained random data sets

Howdy!

When a trial set is bad, either you need to toss it entirely and start over, or you bring the "bad" value into range by taking from values that are in range. This gets more interesting if more than one value is out of range. If you need four values, and the third can't satisfy it's constraints, the fourth won't have any of the pie left for it.

Hmmm... I'm visualizing a pie divider that has n cutters, with the spacing between each cutter constrained to x +/- y.

I'm too lazy right now to work up any code fragments.

I'm not visualizing an approach that will reliably produce valid values without some sort of iteration, either by tossing entire sets that fail, or by adjusting the values. If you pick values for each component without regard to the total and then normalize them so the sum is 100, you will get fewer bad sets, but I can see how normalizing could push a value near the limit out of bounds.

20.0 +- 15.0: 22.7 30.0 +- 25.0: 40.1 50.0 +- 10.0: 37.2 - bad
  1. pick values, say (22.7, 40.1, 57.3) - sum = 120.1
  2. normalize by multiplying by 100/120.1 -> (18.9, 33.4, 47.7) - sum = 100 -> happiness
  3. pick values, say (32.7, 54.3, 42.9) -> sum = 129.9
  4. normalize by multiplying by 100/129.9 -> (25.2, 41.8, 33.0) - sum = 100 but third number too small -> not happiness
  5. pin out of range value to lower limit -> (25.2, 41.8, 40) - sum = 117
  6. renormalize by multiplying values in range by 60/67 (their share/their sum) -> (22.6, 37.4, 40) - sum = 100 -> happiness

Yeah, it's iterative, but so long as the constraints allow a result, it will converge. It's that mechanical pie divider thingy. Sometimes, one of the dividers runs up against its stops and becomes pinned.

yours,
Michael

Replies are listed 'Best First'.
Re^2: Need technique for generating constrained random data sets
by GrandFather (Saint) on Feb 08, 2007 at 09:31 UTC

    Sadly pining values does not lead to happiness because it mucks up the distribution of values - the pined values are likely to be vastly over represented.


    DWIM is Perl's answer to Gödel