in reply to Need technique for generating constrained random data sets

Grandfather,

Your use of sd seems to indicate you are thinking of Standard Deviation for your boundaries.

Know then that a Standard Deviation is not a hard boundary and that is perfectly OK for an individual value to be outside the mean + or - the Standard Deviation.

Chebyshev stated that at least 50% of the values in your set will be within 1.4 standard deviations from the mean. As a corrolary, this means that upto 50% of your values may be more than 1.4 times the Standard Deviation away from your mean. It probably also means that you cannot have a flat distribution for your data if you have to simulate a certain Standard Deviation.

But then again perhaps you did not think of Standard Deviation at all when asking this question!

CountZero

"If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Replies are listed 'Best First'.
Re^2: Need technique for generating constrained random data sets
by xdg (Monsignor) on Feb 07, 2007 at 22:41 UTC

    I suspect that there was originally the idea of using a normal distribution, but then the switch to a flat distribution was done because of boundaries at 0 and 100.

    An alternative would be to use a bounded probability distribution like the Beta distribution or the Kumaraswamy distribution.

    I keep meaning to implement Kumaraswamy in Math::Random::OO one of these days. (Right after I fix it to use Math::Random::MT::Auto as the underlying generator and recode other transformations in XS.)

    -xdg

    Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

Re^2: Need technique for generating constrained random data sets
by GrandFather (Saint) on Feb 08, 2007 at 09:02 UTC

    See Re^2: Need technique for generating constrained random data sets for the back story. The code does include an (unused) provision for using a normal distribution. However given the nature of the search and the uncertainties in some elements of the diet at least, a flat distribution is probably more appropriate than a normal distribution - not my call however.


    DWIM is Perl's answer to Gödel