But when that biased PRNG used within a standard Fisher Yates shuffle, the shuffles are fair, no matter how many times you run it
I am not convinced.
A statistical test for assessing statistical fairness is chi-square: throw a dice a million times and feed chi-square with the counts for each outcome: '1' => ..., '2' => ... , ... '6' => .... It will tell you whether it thinks the dice is fair or not.
So I run your test code with 3 rand-subs:
use Math::Random::MT; my $mt = Math::Random::MT->new(); sub badRand($) { rand() < 0.5 ? 0 : rand( $_[0] ) } sub goodRand($) { rand($_[0]) } sub bestRand($) { $mt->rand($_[0]) }
Then I run the chi-squared test (Statistics::ChiSquare) on the counts of the shuffle (re: the output of your test code).
This is the result:
bad_shuffle : There's a <1% chance that this data is random. good_shuffle : There's a >25% chance, and a <50% chance, that this data is random. best_shuffle : There's a >25% chance, and a <50% chance, that this data is random.
or
bad_shuffle : There's a <1% chance that this data is random. best_shuffle : There's a >50% chance, and a <75% chance, that this data is random. good_shuffle : There's a >10% chance, and a <25% chance, that this data is random.
or
bad_shuffle : There's a <1% chance that this data is random. best_shuffle : There's a >25% chance, and a <50% chance, that this data is random. good_shuffle : There's a >50% chance, and a <75% chance, that this data is random.
The above test consistently rates "bad_shuffle" based on badRand() as producing data with less than 1% chance of being random. It is undecided, let's say, which is best for this particular shuffle: Mersenne or Perl.
The results your test code produced may look fair but they aint according to this particular test - it very much depends on use-case (see end of post). Although you may be right with your hunch that Mersenne may not offer anything more than perl's standard rand() for FY shuffle but hey let's not exaggerate with badRand()! :)
This is my test code:
#!/usr/bin/env perl use strict; use Statistics::ChiSquare; use Math::Random::MT; my $mt = Math::Random::MT->new(); my $NUMTESTS = 100000; sub badRand($) { rand() < 0.5 ? 0 : rand( $_[0] ) } sub goodRand($) { rand($_[0]) } sub bestRand($) { $mt->rand($_[0]) } my %tests = ( 'bad_shuffle' => { 'randsub' => \&badRand, 'rands' => undef }, 'good_shuffle' => { 'randsub' => \&goodRand, 'rands' => undef }, 'best_shuffle' => { 'randsub' => \&bestRand, 'rands' => undef }, ); my @vals = ( 1..4 ); foreach (keys %tests){ my $asub = $tests{$_}->{'randsub'}; $tests{$_}{'results'} = do_a_shuffle($asub, @vals); $tests{$_}{'chi'} = Statistics::ChiSquare::chisquare(values % +{$tests{$_}{'results'}}); print $_ . " : " . $tests{$_}{'chi'}."\n"; } # &randsub, @choices sub shuffle { my $randsub = shift; # rest of params are the values/choices to shuffle $a = $_ + $randsub->( @_ - $_ ), $b = $_[$_], $_[$_] = $_[$a], $_[$a] = $b for 0 .. $#_; return @_; } # &randsub, @choices sub do_a_shuffle { my %tests; for(1..$NUMTESTS){ ++$tests{ join '', shuffle(@_) }; } return \%tests; } __END__
sidenote:
A particular shuffle may be appropriate for a particular situation.
Such as?
In reply to Re^3: Shuffling CODONS
by bliako
in thread Shuffling CODONS
by WouterVG
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |