Re: Perl rand() generates larger numbers for small sample size, bug!
by Bethany (Scribe) on Aug 08, 2014 at 01:12 UTC
|
#!/usr/bin/perl
use strict;
use warnings;
my $sampleSize = 7;
my ($sum, $ssum);
for my $i (1..1000){
$sum = 0;
for my $j (1..$sampleSize){
$sum+=rand();
}
$sum/=$sampleSize;
$ssum+=$sum;
}
$ssum/=1000;
print "$ssum\n";
If you think you've found a show-stopping bug in a core function that others have been using successfully for years or decades, it's a good idea to examine your own code before declaring it a bug in Perl.
Edited to add: Besides, how could the rand() function "know" how many times you're about to call it, or how many you're not going to call it? Short of a maliciously buggy dev-teasing version of Perl, which is too far-fetched to consider, there's no possible mechanism that could do this. | [reply] [d/l] |
|
|
Great explanation! Just a nitpick: If you don't need the value of $sum outside the loop, declare it inside:
my $ssum = 0; # $sum not needed here
for my $i (1..1000){
my $sum = 0; # but here.
for my $j (1..$sampleSize){
$sum+=rand();
}
$sum/=$sampleSize;
$ssum+=$sum;
}
| [reply] [d/l] |
|
|
| [reply] |
|
|
My understanding is not that the OP claimed to have found a bug in rand
The subject of the thread is "Perl rand() generates larger numbers for small sample size, bug!"
Anyway, mistakes are fine, but if one finds oneself saying "I found a bug" but also "I am not an expert", perhaps more cautious wording or some double-checking before crying wolf is in order :-)
| [reply] |
|
|
You're right, of course. I was trying to be constructive rather than harsh. In fact, one of my previous replies contained the very same advice you give in your last sentence.
If my reply came off as a flame, however mild and non-scorchy, Ravi and you both have my apologies.
| [reply] |
|
|
Okay now this is embarrassing. I should have looked at my code carefully. I am sorry for that. Is there a way to delete this post and prevent it to become more embarrassing for me?
Thank you for your time.
| [reply] |
|
|
| [reply] |
|
|
Deleting nodes is strongly discouraged. However, if you like, you can edit your post, using <strike> tags and marking updates as such to correct it - see "How do I change/delete my post?" for more good information on this. You could explain what the problem was and how it was fixed, that would maximize what others can learn from your post.
Mistakes happen, don't worry about it! Acknowledging them and fixing them is much more honorable and productive than trying to hide or defend them :-)
| [reply] |
|
|
|
|
Re: Perl rand() generates larger numbers for small sample size, bug!
by Anonymous Monk on Aug 08, 2014 at 01:13 UTC
|
Why are random numbers important to you - what are you using them for? If truly random numbers are important to you, use a better source.
rand says this:
"rand()" is not cryptographically secure. You should not rely on
it in security-sensitive situations. As of this writing, a number
of third-party CPAN modules offer random number generators
intended by their authors to be cryptographically secure,
including: Data::Entropy, Crypt::Random, Math::Random::Secure, and
Math::TrulyRandom.
This average should be close to 0.5.
Really? Here's a good point to start your research: http://www.random.org/analysis/
It seems that with smaller sample size, rand() functions is biased to produce larger numbers but not the smaller number.
If it were true that it's a fault of rand and not something else, how would rand know the sample size and adjust its output accordingly...?
Since I am not expert in statistics
And yet you claim a "bug", and "Perl's rand() function failed"?
rand also says this:
(Note: If your rand function consistently returns numbers that are
too large or too small, then your version of Perl was probably
compiled with the wrong number of RANDBITS.)
So if you really, really suspect a bug, perhaps investigate that?
| [reply] [d/l] |
|
|
"rand()" is not cryptographically secure. You should not rely on it in security-sensitive situations.
True, but if it really behaved as the OP said it does it wouldn't even be useful for dice games.
The bug exists but it's in the OP's testing script, not in rand(). $sum keeps accumulating more and more numbers, then being divided by 7. The first time through the sum of 7 randoms gets divided by 7. So far so good. But the second time through the sum of 14 randoms (the first seven plus seven more) gets divided by 7. The third time, the sum of 21 randoms gets divided by 7, and so forth.
ETA: The above paragraph is buggy too! I should say the second time, you're adding the sum of 7 more randoms plus 1/7 of the sum of the first 7. The third time you have seven more, plus 1/7 the sum of the second 7, plus 1/49 the sum of the first 7, and so forth. That's why the discrepancy is greater with a small "sample size" — the denominator of the fraction is small, dividing by it results in a larger quotient. It's also why increasing the number of times the outer loop runs produces an asymptotic effect; the more you increase it, the less effect further increases have on the discrepancy from 0.5 because each loop's arithmetic error gets divided out by the (one-less-than-the-number-of-outer-loops-so-far)th power of 7, meaning early errors practically vanish.
The fix is easy. Clear $sum to zero at the beginning of each outer loop and voila — results are right around 0.5 where they ought to be. I tried the code before posting to be sure and yup, it works.
| [reply] |
Re: Perl rand() generates larger numbers for small sample size, bug!
by Cristoforo (Curate) on Aug 08, 2014 at 05:14 UTC
|
Possibly related to the number of random numbers - first from the built in rand and then with rand from Math::Random::MT::Auto. I generated 2 ** 25 random numbers and the code and results are below.
#!/usr/bin/perl
use strict;
use warnings;
use 5.014;
my %rand;
for (1 .. 2**25) {
my $r = rand;
$rand{ $r }++;
}
say scalar keys %rand;
use Math::Random::MT::Auto qw(rand);
%rand = ();
for (1 .. 2**25) {
my $r = rand;
$rand{ $r }++;
}
say scalar keys %rand;
Results:
C:\Old_Data\perlp>perl t.pl
32768
33554432
C:\Old_Data\perlp>perl -E "say 2**25"
33554432
C:\Old_Data\perlp>perl -E "say 2**15"
32768
C:\Old_Data\perlp>
You can see that for 2**25 == 33_554_432 the built in rand only produced 2**15 == 32768 random numbers but the module produced an entire range of random numbers equal to 2**25.
Update: perl version -
C:\Old_Data\perlp>perl -v
This is perl 5, version 14, subversion 1 (v5.14.1) built for MSWin32-x64-multi-thread
Windows version - 7 | [reply] [d/l] [select] |
|
|
$ perl5.12.3 -we '$rand{ rand() }++ for 1..2**22; print int keys %rand
+'
4194304
$ perl5.18.1 -we '$rand{ rand() }++ for 1..2**22; print int keys %rand
+'
4194304
| [reply] [d/l] |