in reply to Need technique for generating constrained random data sets

First of all, it seems quite strange that the mean values for each item do not add up to 100% - are you sure about your model?

Second, if you happen to have N random variables with a constraint, you simply happen to have only (N-1) random variables. If the sum of N items must add up to 100, you can only choose (N-1) of them, otherwise you'll end up breaking the constraint with - ehr - probability 1.

Of course, nobody is saying that you must always choose the same set of N "free" variables! For each iteration, I would suggest a two stage process, in which:

An iteration must be rejected if the (N-1) values do not allow the correct generation of the Nth, of course. This would happen if the (N-1) values add up to more than 100, for example; the probability of this happening is somehow correlated with the variance of the (N-1) variables. In your case, this rejection is easily spotted, because it would lead to negative values for the N-th percentage.

The above process should also address the "mean values do not add up to 100", even if I would repeat my suggestion to verify your model about this.

Flavio
perl -ple'$_=reverse' <<<ti.xittelop@oivalf

Don't fool yourself.
  • Comment on Re: Need technique for generating constrained random data sets

Replies are listed 'Best First'.
Re^2: Need technique for generating constrained random data sets
by GrandFather (Saint) on Feb 08, 2007 at 08:57 UTC

    The backstory is that this is from some inherited code that tries to guess what diet might generate a particular isotope balance measured in bones from an archeological dig. The current technique identifies seven food groups that alter the isotope balance of three elements in different ways. There is not enough information to determine the diet directly from the isotope information so some other technique is required.

    The technique adopted by the software I'm working on generates random diets comprising some proportion of each dietry element. It then calculates the expected isotope balance each diet would generate and records information for those that generate a balance close to the target balance.

    The search space can be narrowed somewhat if you can provide constraints on the proportion of the diet that can be contributed by each component. For example, it is not possible to live on a diet of shell fish alone so a limit can be placed on the maximum contribution that shell fish could make to the diet without causing death in short order (about as long as eating nothing at all actually!).

    So the constraints represent the allowable range for the proportion each dietry element may contribute to the total diet. Obviously for any particular diet the total of the contributing components is 100%.

    A simple set of constraints would be to set the mean for each element to 50 and the range to +- 50 - that is, allow any value. In that case the sum of the means would be 350. Completely legitimate in the context of the problem space, but it doesn't constrain the search much!


    DWIM is Perl's answer to Gödel