For the second question (is Math::Random::MT is a correct implementation of the Mersenne Twister algorithm?), I have no idea. But it does say (in its documentation) that

Based on the C implementation of MT19937 Copyright (C) 1997 - 2002, Makoto Matsumoto and Takuji Nishimura

If we interpret "Based" as "exact translation" then unless there is a bug in that code it must be correctly implemented. I guess.

For the third question (is the Fisher-Yates algorithm proven to produce fair shuffles?), I quote from wikipedia's entry of FY

Care must be taken when implementing the Fisher–Yates shuffle, both in the implementation of the algorithm itself and in the generation of the random numbers it is built on, otherwise the results may show detectable bias. A number of common sources of bias have been listed below...

Assuming code is correct, then it all boils down to the RNG.

If this source A Brief History of Random Numbers is right then the first PRNG came in 1946, long after FY (1938). Did FY know about biased RNG then or is it when PRNGs were introduced that RNG bias started being an issue? I don't know. But it is quite reasonable to ask any stochastic algorithm maker to tell me how their algorithm performs given a few different PRNGs.

For the fourth question (is my implementation we've been using a correct implementation of that algorithm?), here is wikipedia's algorithm with later enhancements:

-- To shuffle an array a of n elements (indices 0..n-1): for i from 0 to n-2 do j <- random integer such that i <= j < n exchange a[i] and a[j]

First your "create random integer" part of the code

$n = scalar(@aliased); $a = $_ + rand @aliased - $_ # original code = i + rand($n-i) # which becomes.... = i + choiceof(0, 1, ..., $n-i-1) # perl's rand(10) dumps 0-9 inclu +sive = random integer such that i <= j <= i+$n-i-1 = random integer such that i <= j <= $n-1 = random integer such that i <= j < $n

I think that validates OK.

Next the loop:

for 0 .. $#aliased { $a = $_ + rand @aliased - $_; $b = $aliased[ $_ ], $aliased[ $_ ] = $aliased[ $a ], $aliased[ $a ] = $b }

becomes

for i from 0 to n-1 do j <- random integer such that i <= j < n exchange a[i] and a[j]

The only discrepancy I can spot is that your loop is up to (n-1) instead of (n-2). Maybe I got something wrong.

Regarding the first question now (is the Mersenne Twister algorithm an excellent example of a PRNG?), there are many situations which one PRNG is better than another. For example, Math::Random::MT's overview states

This algorithm has a very uniform distribution and is good for modelling purposes but do not use it for cryptography.

I have concocted three more tests in addition to the autocorrelation coefficient (already mentioned) and chi-squared (already mentioned) and of course the evenness of all outcomes in the long run:

Random walk test, use the PRNG to do a random walk and if it is any good then random walk theoretical convergence will be confirmed, random_walk_test.pl (it is using Statistics::Test::RandomWalk)

#!/usr/bin/env perl use strict; use warnings; use Statistics::Test::RandomWalk; use Math::Random::MT; my $NUM_TRIALS ||=100; my $NUM_BINS ||= 10; my ($quant, $got, $expected); my $MTgen = Math::Random::MT->new(); sub perls_rand { rand() } sub mt_rand { $MTgen->rand } my %results = (); my $best_results = [ undef, undef ]; foreach( (["RandomWalk test : Perl's rand()", \&perls_rand, $NUM_TRIALS +, $NUM_BINS], ["RandomWalk test : Math::Random::MT's rand()", \&mt_rand, $N +UM_TRIALS, $NUM_BINS], ) ){ my $aresult = do_a_test(@$_); $results{$aresult->[3]} = $aresult; print $aresult->[0]."\n\n"; if( ! defined($best_results->[0]) or $best_results->[0] > $are +sult->[1] ){ $best_results->[0] = $aresult->[1]; # mean perc diff $best_results->[1] = $_->[0]; } } print "\n\n$0 : done, here is a summary:\n"; foreach(values %results){ print "$0 : ".$_->[3]." with mean %diff from expected to be ". +sprintf("%.3f",$_->[1])." %\n"; } print "\n\n$0 : best results for ".$best_results->[1]." with mean %dif +f from expected to be ".sprintf("%.3f",$best_results->[0])." %\n"; print "$0 : end.\n"; exit(0); # Does a trial given # a test label (string) # a sub ref to a rand() # the number of iterations # the number of bins # see http://search.cpan.org/dist/Statistics-Test-RandomWalk/lib/Stati +stics/Test/RandomWalk.pm # returns an arrayref with a report, mean percentage difference and ch +ange # and iters and nbins (see the return statement below) sub do_a_test { my ($test_label, $rand_sub, $niters, $nbins) = @_; my $tester_MT = Statistics::Test::RandomWalk->new(); defined($tester_MT) or die "tester_MT new() failed."; $tester_MT->set_data( [map {$rand_sub->()} 1..$niters] ); my ($quant, $got, $expected) = $tester_MT->test($nbins); my ($subreport, $mean_diff, $mean_change) = @{my_report_make($ +quant, $got, $expected)}; my $report = "$test_label\n after $niters ite +rations and using $nbins bins:\n" . $subreport . "======================= end ======================= +=" ; return [ $report, $mean_diff, $mean_change, $test_label, $niters, $nbins ] } # returns an arrayref of # a string report which can be printed # the mean percentage difference ( 200 * abs(A-B)/(A+B) %) # the mean percentage change ( 100 - 100*A/B % ) sub my_report_make { my ($quant, $got, $expected) = @_; my $N = scalar(@$quant); my ($i, $percent_diff, $percent_change); my $sum_diff = 0; my $sum_change = 0; my $ret = "Quantile | Got | Expected | % diff +| % change\n========================================================= +=====\n"; for($i=0;$i<$N;$i++){ $percent_diff = percentage_difference($got->[$i], $exp +ected->[$i]); $percent_change = percentage_change($got->[$i], $expec +ted->[$i]); $ret .= sprintf("%-11.3f%12.3f %14.3f%10.3f%12.3f\n", +$quant->[$i], $got->[$i], $expected->[$i], $percent_diff, $percent_ch +ange); $sum_diff += $percent_diff; $sum_change += $percent_change; } my $mean_diff = $sum_diff/$N; my $mean_change = $sum_change/$N; $ret .= sprintf("\ntotal percentage difference between got and + expected: %.7f %%\n>>>> mean percentage difference: %.7f %% <<<<\n", + $sum_diff, $mean_diff); $ret .= sprintf("\ntotal percentage change between got and exp +ected: %.7f %%\n>>>> mean percentage change: %.7f %% <<<<\n", $sum_c +hange, $mean_change); return [$ret, $mean_diff, $mean_change] } sub percentage_difference { my ($a, $b) = @_; return 200 * abs($a-$b) / ($a+$b) } sub percentage_change { my ($a, $b) = @_; return 100 - 100 * $a / $b }

Compression test, gzip a random sequence of integers and see what compression ratio is achieved. A truly random sequence should not be compressed more than one which contains patterns, compression_test.pl :

#!/usr/bin/env perl use strict; use warnings; use Gzip::Faster; use Math::Random::MT; my $DATA_LENGTH ||=100000; my $NUM_REPEATS ||=100; my ($quant, $got, $expected); my $MTgen = Math::Random::MT->new(); sub perls_rand { int(rand(1000000)) } sub mt_rand { int($MTgen->rand(1000000)) } my %results = (); my $best_results = [ undef, undef ]; foreach( (["Gzip compression test : Perl's rand()", \&perls_rand, $NUM_ +REPEATS, $DATA_LENGTH], ["Gzip compression test : Math::Random::MT's rand()", \&mt_ra +nd, $NUM_REPEATS, $DATA_LENGTH], ) ){ my $aresult = do_a_test(@$_); $results{$aresult->[1]} = $aresult; # key is the label print $aresult->[0]."\n\n"; # the report if( ! defined($best_results->[0]) or $best_results->[0] < $are +sult->[4] ){ $best_results->[0] = $aresult->[4]; # mean perc diff $best_results->[1] = $_->[0]; # the label } } print "\n\n$0 : done, here is a summary:\n"; foreach(values %results){ print "$0 : ".$_->[1]." with mean %diff from expected to be ". +sprintf("%.3f",$_->[4])." %\n"; } print "\n\n$0 : best results for ".$best_results->[1]." with mean %dif +f from expected to be ".sprintf("%.3f",$best_results->[0])." %\n"; print "$0 : end.\n"; exit(0); # Does a test given # a test label (string) # a sub ref to a rand() # the number of repeats # the string length to produce for compression # see http://search.cpan.org/dist/Statistics-Test-RandomWalk/lib/Stati +stics/Test/RandomWalk.pm # returns an arrayref with a report, mean percentage difference and ch +ange # and iters and nbins (see the return statement below) sub do_a_test { my ($test_label, $rand_sub, $nrepeats, $slength) = @_; my @compression_totals = (0,0,0,0); my ($a); print "$test_label : doing $nrepeats repeats and using random +strings of length $slength :\n"; for(1..$nrepeats){ my $random_str_to_compress = ""; for(1..$slength){ $random_str_to_compress .= $rand_sub->(); } my $gzipped_str = Gzip::Faster::gzip($random_str_to_co +mpress); my $length_gzipped_str = length($gzipped_str); my $length_random_str_to_compress = length($random_str +_to_compress); $compression_totals[0] += $length_random_str_to_compre +ss; $compression_totals[1] += $length_gzipped_str; $compression_totals[2] += percentage_difference($lengt +h_random_str_to_compress, $length_gzipped_str); $compression_totals[3] += percentage_change($length_ra +ndom_str_to_compress, $length_gzipped_str); print "repeat $_ / $nrepeats:\n" . " length of random string: ".$length_random +_str_to_compress."\n" . " length of compressed string: ".$length_gz +ipped_str."\n" . " compression ratio as percentage change: " +.percentage_difference($length_random_str_to_compress, $length_gzippe +d_str)."\n" . " compression ratio as percentage differenc +e: ".percentage_change($length_random_str_to_compress, $length_gzippe +d_str)."\n" } $_/=$nrepeats foreach(@compression_totals); my $report = "$test_label\n after $nrepeats repeats +and using random strings of length $slength :\n" . "mean length of random string: ".$compression_totals +[0]."\n" . "mean length of compressed string: ".$compression_to +tals[1]."\n" . "mean compression ratio as percentage change: ".$com +pression_totals[2]."\n" . "mean compression ratio as percentage difference: ". +$compression_totals[3] ; return [ $report, $test_label, @compression_totals ] } sub percentage_difference { my ($a, $b) = @_; return 200 * abs($a-$b) / ($a+$b) } sub percentage_change { my ($a, $b) = @_; return 100 - 100 * $a / $b }

Build transition matrix for a sequence of random integers with specified lag, i.e. count the occurence of rand[i] and rand[i+lag] for each i. For lag=1 we count the transitions between neighbours, e.g. if current random number was 4 and next random number is 5 then increment count of counts[4][5]. For a true random sequence of integers we expect that all transitions have equal probabilities (counts), well they aint, transition_matrix_test.pl follows :

#!/usr/bin/env perl use strict; use warnings; use Math::Random::MT; use Crypt::Random; use Data::Dumper; my $MTgen = Math::Random::MT->new(); # number of repeats my $NUM_REPEATS ||= 2000000; # range of rand ints to produce is 0..$MAX_RAND INCL! my $MAX_RAND ||= 30; # for LAG=1 it calculates the prob between two consecutive random numb +ers # for LAG=2, it checks when there is one-in-between the two rands. e.g +. A x B # e.g. LAG=1 rand[0] -> rand[1] # LAG=3 rand[0] -> rand[3] (rand [1] and [2] are skipped) my $LAG ||= 1; sub perls_rand { int(rand($MAX_RAND+1)) } sub mt_rand { int($MTgen->rand($MAX_RAND+1)) } # size: number of bits in base 10 i.e. a base10 digit # uniform for uniform rn # do not use these two because it will block after a few rand requests sub dev_rand { Crypt::Random::makerandom(Size=>1, Uniform=>1) } sub dev_u_rand { Crypt::Random::makerandom(Size=>1, Uniform=>1, St +rength=>0) } my %results = (); my $best_results = [ undef, undef ]; foreach( (["transition matrix test : Perl's rand()", \&perls_rand, $NUM +_REPEATS, $MAX_RAND, $LAG], ["transition matrix test : Math::Random::MT's rand()", \&mt_r +and, $NUM_REPEATS, $MAX_RAND, $LAG], # do not use this because it will block after a few rand requests: # ["transition matrix test : /dev/random's rand()", \&dev_rand, + $NUM_REPEATS, $MAX_RAND, $LAG], # ["transition matrix test : /dev/urandom's rand()", \&dev_u_ra +nd, $NUM_REPEATS, $MAX_RAND, $LAG], ) ){ my $aresult = do_a_test(@$_); $results{$aresult->[1]} = $aresult; print $aresult->[0]; # the report print "====================================\n\n\n"; if( ! defined($best_results->[0]) or $best_results->[0] > $are +sult->[2] ){ $best_results->[0] = $aresult->[2]; # mean deviation f +rom expected count $best_results->[1] = $_->[0]; # the label } } print "\n\n$0 : done, here is a summary:\n"; foreach(values %results){ print "$0 : ".$_->[1]." with mean %diff from expected count to + be ".sprintf("%.3f",$_->[2])." %\n"; } print "\n\n$0 : best results for ".$best_results->[1]." with mean %dif +f from expected count to be ".sprintf("%.3f",$best_results->[0])." %\ +n"; print "$0 : end.\n"; exit(0); sub do_a_test { my ($test_label, $rand_sub, $nrepeats, $maxrand, $lag) = @_; my @mat = ( map { [(0) x ($maxrand+1)] } (0..$maxrand) ); my $expected = $nrepeats / ($maxrand+1)**2; my $expected_probabilities = $expected / $nrepeats; my @allrands = (0..$maxrand); my ($i, $j); my $lag1 = $lag > 1 ? $lag-1 : 0; print "do_a_test() : $test_label :"; for(1..$nrepeats){ $i = $rand_sub->(); for(1..$lag1){ $rand_sub->() } $j = $rand_sub->(); $mat[$i][$j]++; if( $_ % 100000 == 0 ){ print " ".$_; select()->flush( +); } } my $deviations = 0; foreach $i (@allrands){ foreach $j (@allrands){ $mat[$i][$j] /= $nrepeats; # convert to probab +ilities $deviations += percentage_difference($expected +_probabilities, $mat[$i][$j]); } } $deviations /= ($maxrand+1)**2; my $report = "$test_label\n after $nrepeats repeats +with a lag of $lag and using random integers from 0 to $maxrand:\n"; $report .= " ".join(" ", @allrands)."\n" +; foreach $i (@allrands){ $report .= sprintf("%-5d", $i); foreach $j (@allrands){ $report .= sprintf("%10.7f ", $mat[$i][$j]); } $report .= "\n"; } $report .= "\nmean %diff from expected count ($expected) / exp +ected transition probability ($expected_probabilities) is ".sprintf(" +%.5f", $deviations)." % (instead of 0).\n"; return [$report, $test_label, $deviations ] } sub percentage_difference { my ($a, $b) = @_; return 200 * abs($a-$b) / ($a+$b) } sub percentage_change { my ($a, $b) = @_; return 100 - 100 * $a / $b }

Some results:

./random_walk_test.pl : done 10000000 trials with 10 bins, here is a s +ummary: ./random_walk_test.pl : RandomWalk test : Math::Random::MT's rand() wi +th mean %diff from expected to be 0.022 % ./random_walk_test.pl : RandomWalk test : Perl's rand() with mean %dif +f from expected to be 0.019 % ./random_walk_test.pl : best results for RandomWalk test : Perl's rand +() with mean %diff from expected to be 0.019 % ./random_walk_test.pl : end. ./compression_test.pl : done 100 repeats each time compressing a strin +g of random integers of length 1000000, here is a summary: ./compression_test.pl : Gzip compression test : Math::Random::MT's ran +d() with mean %diff from expected to be 72.231 % ./compression_test.pl : Gzip compression test : Perl's rand() with mea +n %diff from expected to be 72.232 % ./compression_test.pl : best results for Gzip compression test : Perl' +s rand() with mean %diff from expected to be 72.232 % ./compression_test.pl : end. ./transition_matrix_test.pl : done 2000000 repeats, with random intege +rs between 0-30 inclusive and a lag of 1, here is a summary: ./transition_matrix_test.pl : transition matrix test : Perl's rand() w +ith mean %diff from expected count to be 1.796 % ./transition_matrix_test.pl : transition matrix test : Math::Random::M +T's rand() with mean %diff from expected count to be 1.797 % ./transition_matrix_test.pl : best results for transition matrix test +: Perl's rand() with mean %diff from expected count to be 1.796 % ./transition_matrix_test.pl : end.

For these 3 tests the 2 candidates (Math::Random::MT and perls builtin rand()) do the same ON THE LONG RUN. So, perl's rand() is a decent choice compared to the alternative (MT) but there are margins for improvement, re: the transition matrix test.

bw, bliako


In reply to Re^11: Last comment (and nail in the coffin of) of S::CS :) by bliako
in thread Shuffling CODONS by WouterVG

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.