in reply to Perl rand() generates larger numbers for small sample size, bug!

Why are random numbers important to you - what are you using them for? If truly random numbers are important to you, use a better source.

rand says this:

"rand()" is not cryptographically secure. You should not rely on it in security-sensitive situations. As of this writing, a number of third-party CPAN modules offer random number generators intended by their authors to be cryptographically secure, including: Data::Entropy, Crypt::Random, Math::Random::Secure, and Math::TrulyRandom.
This average should be close to 0.5.

Really? Here's a good point to start your research: http://www.random.org/analysis/

It seems that with smaller sample size, rand() functions is biased to produce larger numbers but not the smaller number.

If it were true that it's a fault of rand and not something else, how would rand know the sample size and adjust its output accordingly...?

Since I am not expert in statistics

And yet you claim a "bug", and "Perl's rand() function failed"?

rand also says this:

(Note: If your rand function consistently returns numbers that are too large or too small, then your version of Perl was probably compiled with the wrong number of RANDBITS.)

So if you really, really suspect a bug, perhaps investigate that?

  • Comment on Re: Perl rand() generates larger numbers for small sample size, bug!
  • Download Code

Replies are listed 'Best First'.
Re^2: Perl rand() generates larger numbers for small sample size, bug!
by Bethany (Scribe) on Aug 08, 2014 at 02:23 UTC
    "rand()" is not cryptographically secure. You should not rely on it in security-sensitive situations.

    True, but if it really behaved as the OP said it does it wouldn't even be useful for dice games.

    The bug exists but it's in the OP's testing script, not in rand(). $sum keeps accumulating more and more numbers, then being divided by 7. The first time through the sum of 7 randoms gets divided by 7. So far so good. But the second time through the sum of 14 randoms (the first seven plus seven more) gets divided by 7. The third time, the sum of 21 randoms gets divided by 7, and so forth.

    ETA: The above paragraph is buggy too! I should say the second time, you're adding the sum of 7 more randoms plus 1/7 of the sum of the first 7. The third time you have seven more, plus 1/7 the sum of the second 7, plus 1/49 the sum of the first 7, and so forth. That's why the discrepancy is greater with a small "sample size" — the denominator of the fraction is small, dividing by it results in a larger quotient. It's also why increasing the number of times the outer loop runs produces an asymptotic effect; the more you increase it, the less effect further increases have on the discrepancy from 0.5 because each loop's arithmetic error gets divided out by the (one-less-than-the-number-of-outer-loops-so-far)th power of 7, meaning early errors practically vanish.

    The fix is easy. Clear $sum to zero at the beginning of each outer loop and voila — results are right around 0.5 where they ought to be. I tried the code before posting to be sure and yup, it works.