ravipatel4 has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I wanted to test if the Perl's rand() function is fair or biased. I performed couple of different tests using different sample sizes. First I generated 7 random numbers between 0 and 1 and averaged them. This average should be close to 0.5. I repeated this test thousand times and calculated average of 1000 averages to see if it is any different from 0.5. Surprisingly it was 0.58. Following code describes this experiment. No matter how many times I run the following code the value never reaches close to 0.5.

$sampleSize = 7; for $i (1..1000){ for $j (1..$sampleSize){ $sum+=rand(); } $sum/=$sampleSize; $ssum+=$sum; } $ssum/=1000; print "$ssum\n";

I performed the same experiment but now with $sampleSize=10 or $sampleSize=100. I received 0.55 and 0.50 as output for 10 and 100 respectively. It seems that with smaller sample size, rand() functions is biased to produce larger numbers but not the smaller number.

Since I am not expert in statistics, to test if this is the phenomena of statistics, I performed the same experiment in R with small sample size (7) using following code. However, it generated value very close to 0.5. Please refer to the following R code.

for (i in 1:1000 ) { x2[i]=mean(runif(7, 0, 1)) } mean(x2)
Irrespective of the sample size (7,10 or 100), R produced same value but Perl's rand() function failed. I also tested rand() and irand() functions of Math::Random::Secure module of perl. They also tend to produce larger values for small sample size.

Please use the above code to reproduce these results. Any help or suggestions would be highly appreciated.

Never mind guys. I made a mistake in my code. A small silly mistake. Just needed to empty my $sum variable after each of 1000 iterations. Thank you for your time.

Best,

Ravi

Replies are listed 'Best First'.
Re: Perl rand() generates larger numbers for small sample size, bug!
by Bethany (Scribe) on Aug 08, 2014 at 01:12 UTC

    You aren't clearing $sum at the beginning of each outer loop. Try this:

    #!/usr/bin/perl use strict; use warnings; my $sampleSize = 7; my ($sum, $ssum); for my $i (1..1000){ $sum = 0; for my $j (1..$sampleSize){ $sum+=rand(); } $sum/=$sampleSize; $ssum+=$sum; } $ssum/=1000; print "$ssum\n";

    If you think you've found a show-stopping bug in a core function that others have been using successfully for years or decades, it's a good idea to examine your own code before declaring it a bug in Perl.

    Edited to add: Besides, how could the rand() function "know" how many times you're about to call it, or how many you're not going to call it? Short of a maliciously buggy dev-teasing version of Perl, which is too far-fetched to consider, there's no possible mechanism that could do this.

      Great explanation! Just a nitpick: If you don't need the value of $sum outside the loop, declare it inside:
      my $ssum = 0; # $sum not needed here for my $i (1..1000){ my $sum = 0; # but here. for my $j (1..$sampleSize){ $sum+=rand(); } $sum/=$sampleSize; $ssum+=$sum; }
      لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      If you think you've found a show-stopping bug in a core function that others have been using successfully for years or decades, it's a good idea to examine your own code before declaring it a bug in Perl.

      My understanding is not that the OP claimed to have found a bug in rand, but asked a question because his or her tests did not behave as expected. OK, the test was buggy, and it is very good that you pointed out where the mistake is, but no reason to be harsh on the OP. After all, we all make mistakes (well, at least I do), actually, making mistakes is a great way to learn how not to make them again.
        My understanding is not that the OP claimed to have found a bug in rand

        The subject of the thread is "Perl rand() generates larger numbers for small sample size, bug!"

        Anyway, mistakes are fine, but if one finds oneself saying "I found a bug" but also "I am not an expert", perhaps more cautious wording or some double-checking before crying wolf is in order :-)

        You're right, of course. I was trying to be constructive rather than harsh. In fact, one of my previous replies contained the very same advice you give in your last sentence.

        If my reply came off as a flame, however mild and non-scorchy, Ravi and you both have my apologies.

      Okay now this is embarrassing. I should have looked at my code carefully. I am sorry for that. Is there a way to delete this post and prevent it to become more embarrassing for me?

      Thank you for your time.

        No need to delete it, and I believe Perl Monks' policy is to discourage deleting posts. Everyone makes mistakes. If we learn from them, we're wiser for having made them.

        Deleting nodes is strongly discouraged. However, if you like, you can edit your post, using <strike> tags and marking updates as such to correct it - see "How do I change/delete my post?" for more good information on this. You could explain what the problem was and how it was fixed, that would maximize what others can learn from your post.

        Mistakes happen, don't worry about it! Acknowledging them and fixing them is much more honorable and productive than trying to hide or defend them :-)

Re: Perl rand() generates larger numbers for small sample size, bug!
by Anonymous Monk on Aug 08, 2014 at 01:13 UTC

    Why are random numbers important to you - what are you using them for? If truly random numbers are important to you, use a better source.

    rand says this:

    "rand()" is not cryptographically secure. You should not rely on it in security-sensitive situations. As of this writing, a number of third-party CPAN modules offer random number generators intended by their authors to be cryptographically secure, including: Data::Entropy, Crypt::Random, Math::Random::Secure, and Math::TrulyRandom.
    This average should be close to 0.5.

    Really? Here's a good point to start your research: http://www.random.org/analysis/

    It seems that with smaller sample size, rand() functions is biased to produce larger numbers but not the smaller number.

    If it were true that it's a fault of rand and not something else, how would rand know the sample size and adjust its output accordingly...?

    Since I am not expert in statistics

    And yet you claim a "bug", and "Perl's rand() function failed"?

    rand also says this:

    (Note: If your rand function consistently returns numbers that are too large or too small, then your version of Perl was probably compiled with the wrong number of RANDBITS.)

    So if you really, really suspect a bug, perhaps investigate that?

      "rand()" is not cryptographically secure. You should not rely on it in security-sensitive situations.

      True, but if it really behaved as the OP said it does it wouldn't even be useful for dice games.

      The bug exists but it's in the OP's testing script, not in rand(). $sum keeps accumulating more and more numbers, then being divided by 7. The first time through the sum of 7 randoms gets divided by 7. So far so good. But the second time through the sum of 14 randoms (the first seven plus seven more) gets divided by 7. The third time, the sum of 21 randoms gets divided by 7, and so forth.

      ETA: The above paragraph is buggy too! I should say the second time, you're adding the sum of 7 more randoms plus 1/7 of the sum of the first 7. The third time you have seven more, plus 1/7 the sum of the second 7, plus 1/49 the sum of the first 7, and so forth. That's why the discrepancy is greater with a small "sample size" — the denominator of the fraction is small, dividing by it results in a larger quotient. It's also why increasing the number of times the outer loop runs produces an asymptotic effect; the more you increase it, the less effect further increases have on the discrepancy from 0.5 because each loop's arithmetic error gets divided out by the (one-less-than-the-number-of-outer-loops-so-far)th power of 7, meaning early errors practically vanish.

      The fix is easy. Clear $sum to zero at the beginning of each outer loop and voila — results are right around 0.5 where they ought to be. I tried the code before posting to be sure and yup, it works.

Re: Perl rand() generates larger numbers for small sample size, bug!
by Cristoforo (Curate) on Aug 08, 2014 at 05:14 UTC
    Possibly related to the number of random numbers - first from the built in rand and then with rand from Math::Random::MT::Auto. I generated 2 ** 25 random numbers and the code and results are below.
    #!/usr/bin/perl use strict; use warnings; use 5.014; my %rand; for (1 .. 2**25) { my $r = rand; $rand{ $r }++; } say scalar keys %rand; use Math::Random::MT::Auto qw(rand); %rand = (); for (1 .. 2**25) { my $r = rand; $rand{ $r }++; } say scalar keys %rand;
    Results:
    C:\Old_Data\perlp>perl t.pl 32768 33554432 C:\Old_Data\perlp>perl -E "say 2**25" 33554432 C:\Old_Data\perlp>perl -E "say 2**15" 32768 C:\Old_Data\perlp>
    You can see that for 2**25 == 33_554_432 the built in rand only produced 2**15 == 32768 random numbers but the module produced an entire range of random numbers equal to 2**25.

    Update: perl version -
    C:\Old_Data\perlp>perl -v This is perl 5, version 14, subversion 1 (v5.14.1) built for MSWin32-x64-multi-thread

    Windows version - 7

      Provide more info on your perl version, etc? Unable to replicate this finding (on x86_64-linux):

      $ perl5.12.3 -we '$rand{ rand() }++ for 1..2**22; print int keys %rand +' 4194304 $ perl5.18.1 -we '$rand{ rand() }++ for 1..2**22; print int keys %rand +' 4194304