For the second question (is Math::Random::MT is a correct implementation of the Mersenne Twister algorithm?), I have no idea. But it does say (in its documentation) that
Based on the C implementation of MT19937 Copyright (C) 1997 - 2002, Makoto Matsumoto and Takuji Nishimura
If we interpret "Based" as "exact translation" then unless there is a bug in that code it must be correctly implemented. I guess.
For the third question (is the Fisher-Yates algorithm proven to produce fair shuffles?), I quote from wikipedia's entry of FY
Care must be taken when implementing the Fisher–Yates shuffle, both in the implementation of the algorithm itself and in the generation of the random numbers it is built on, otherwise the results may show detectable bias. A number of common sources of bias have been listed below...
Assuming code is correct, then it all boils down to the RNG.
If this source A Brief History of Random Numbers is right then the first PRNG came in 1946, long after FY (1938). Did FY know about biased RNG then or is it when PRNGs were introduced that RNG bias started being an issue? I don't know. But it is quite reasonable to ask any stochastic algorithm maker to tell me how their algorithm performs given a few different PRNGs.
For the fourth question (is my implementation we've been using a correct implementation of that algorithm?), here is wikipedia's algorithm with later enhancements:
-- To shuffle an array a of n elements (indices 0..n-1):
for i from 0 to n-2 do
j <- random integer such that i <= j < n
exchange a[i] and a[j]
First your "create random integer" part of the code
$n = scalar(@aliased);
$a = $_ + rand @aliased - $_ # original code
= i + rand($n-i) # which becomes....
= i + choiceof(0, 1, ..., $n-i-1) # perl's rand(10) dumps 0-9 inclu
+sive
= random integer such that i <= j <= i+$n-i-1
= random integer such that i <= j <= $n-1
= random integer such that i <= j < $n
I think that validates OK.
Next the loop:
for 0 .. $#aliased {
$a = $_ + rand @aliased - $_;
$b = $aliased[ $_ ],
$aliased[ $_ ] = $aliased[ $a ],
$aliased[ $a ] = $b
}
becomes
for i from 0 to n-1 do
j <- random integer such that i <= j < n
exchange a[i] and a[j]
The only discrepancy I can spot is that your loop is up to (n-1) instead of (n-2). Maybe I got something wrong.
Regarding the first question now (is the Mersenne Twister algorithm an excellent example of a PRNG?), there are many situations which one PRNG is better than another. For example, Math::Random::MT's overview states This algorithm has a very uniform distribution and is good for modelling purposes but do not use it for cryptography.
I have concocted three more tests in addition to the autocorrelation coefficient (already mentioned) and chi-squared (already mentioned) and of course the evenness of all outcomes in the long run:
Random walk test, use the PRNG to do a random walk and if it is any good then random walk theoretical convergence will be confirmed, random_walk_test.pl (it is using Statistics::Test::RandomWalk)
Compression test, gzip a random sequence of integers and see what compression ratio is achieved. A truly random sequence should not be compressed more than one which contains patterns, compression_test.pl :
Build transition matrix for a sequence of random integers with specified lag, i.e. count the occurence of rand[i] and rand[i+lag] for each i. For lag=1 we count the transitions between neighbours, e.g. if current random number was 4 and next random number is 5 then increment count of counts[4][5]. For a true random sequence of integers we expect that all transitions have equal probabilities (counts), well they aint, transition_matrix_test.pl follows :
Some results:
./random_walk_test.pl : done 10000000 trials with 10 bins, here is a s
+ummary:
./random_walk_test.pl : RandomWalk test : Math::Random::MT's rand() wi
+th mean %diff from expected to be 0.022 %
./random_walk_test.pl : RandomWalk test : Perl's rand() with mean %dif
+f from expected to be 0.019 %
./random_walk_test.pl : best results for RandomWalk test : Perl's rand
+() with mean %diff from expected to be 0.019 %
./random_walk_test.pl : end.
./compression_test.pl : done 100 repeats each time compressing a strin
+g of random integers of length 1000000, here is a summary:
./compression_test.pl : Gzip compression test : Math::Random::MT's ran
+d() with mean %diff from expected to be 72.231 %
./compression_test.pl : Gzip compression test : Perl's rand() with mea
+n %diff from expected to be 72.232 %
./compression_test.pl : best results for Gzip compression test : Perl'
+s rand() with mean %diff from expected to be 72.232 %
./compression_test.pl : end.
./transition_matrix_test.pl : done 2000000 repeats, with random intege
+rs between 0-30 inclusive and a lag of 1, here is a summary:
./transition_matrix_test.pl : transition matrix test : Perl's rand() w
+ith mean %diff from expected count to be 1.796 %
./transition_matrix_test.pl : transition matrix test : Math::Random::M
+T's rand() with mean %diff from expected count to be 1.797 %
./transition_matrix_test.pl : best results for transition matrix test
+: Perl's rand() with mean %diff from expected count to be 1.796 %
./transition_matrix_test.pl : end.
For these 3 tests the 2 candidates (Math::Random::MT and perls builtin rand()) do the same ON THE LONG RUN. So, perl's rand() is a decent choice compared to the alternative (MT) but there are margins for improvement, re: the transition matrix test.
bw, bliako |