comment on

But when that biased PRNG used within a standard Fisher Yates shuffle, the shuffles are fair, no matter how many times you run it

I am not convinced.

A statistical test for assessing statistical fairness is chi-square: throw a dice a million times and feed chi-square with the counts for each outcome: '1' => ..., '2' => ... , ... '6' => .... It will tell you whether it thinks the dice is fair or not.

So I run your test code with 3 rand-subs:

use Math::Random::MT;
my $mt = Math::Random::MT->new();
sub badRand($) { rand() < 0.5 ? 0 : rand( $_[0] ) }
sub goodRand($) { rand($_[0]) }
sub bestRand($) { $mt->rand($_[0]) }
[download]

Then I run the chi-squared test (Statistics::ChiSquare) on the counts of the shuffle (re: the output of your test code).

This is the result:

bad_shuffle : There's a <1% chance that this data is random. 
good_shuffle : There's a >25% chance, and a <50% chance, that this data is random.
best_shuffle : There's a >25% chance, and a <50% chance, that this data is random.

bad_shuffle : There's a <1% chance that this data is random.
best_shuffle : There's a >50% chance, and a <75% chance, that this data is random.
good_shuffle : There's a >10% chance, and a <25% chance, that this data is random.

bad_shuffle : There's a <1% chance that this data is random.
best_shuffle : There's a >25% chance, and a <50% chance, that this data is random.
good_shuffle : There's a >50% chance, and a <75% chance, that this data is random.

The above test consistently rates "bad_shuffle" based on badRand() as producing data with less than 1% chance of being random. It is undecided, let's say, which is best for this particular shuffle: Mersenne or Perl.

The results your test code produced may look fair but they aint according to this particular test - it very much depends on use-case (see end of post). Although you may be right with your hunch that Mersenne may not offer anything more than perl's standard rand() for FY shuffle but hey let's not exaggerate with badRand()! :)

This is my test code:

#!/usr/bin/env perl

use strict;
use Statistics::ChiSquare;
use Math::Random::MT;

my $mt = Math::Random::MT->new();

my $NUMTESTS = 100000;

sub badRand($) { rand() < 0.5 ? 0 : rand( $_[0] ) }
sub goodRand($) { rand($_[0]) }
sub bestRand($) { $mt->rand($_[0]) }

my %tests = (
        'bad_shuffle' => {
                'randsub' => \&badRand,
                'rands' => undef
        },
        'good_shuffle' => {
                'randsub' => \&goodRand,
                'rands' => undef
        },
        'best_shuffle' => {
                'randsub' => \&bestRand,
                'rands' => undef
        },
);
my @vals = ( 1..4 );

foreach (keys %tests){
        my $asub = $tests{$_}->{'randsub'};
        $tests{$_}{'results'} = do_a_shuffle($asub, @vals);
        $tests{$_}{'chi'} =  Statistics::ChiSquare::chisquare(values %
+{$tests{$_}{'results'}});
        print $_ . " : " . $tests{$_}{'chi'}."\n";
}
# &randsub, @choices
sub     shuffle {
        my $randsub = shift;
        # rest of params are the values/choices to shuffle
        $a = $_ + $randsub->( @_ - $_ ),
        $b = $_[$_],
        $_[$_] = $_[$a],
        $_[$a] = $b
                for 0 .. $#_;
        return @_;
}
# &randsub, @choices
sub     do_a_shuffle {
        my %tests;
        for(1..$NUMTESTS){
                ++$tests{ join '', shuffle(@_) };
        }
        return \%tests;
}
__END__
[download]

sidenote:

A particular shuffle may be appropriate for a particular situation.

Such as?

poker and co where patterns in the shuffled cards may influence the game. The shuffled cards must be tested for the number of patterns they contain.
medical trials : a list of "same-category" (e.g. all healthy) patients must be shuffled in order to then separate them to different groups for trial treatment.
related to #2 : I have a group with certain average properties. Then I group them randomly, after a shuffle, does each group has on average the same properties? Or have I grouped together the tall boys and put in another group the short boys, because my shuffle algorithm was not suitable?

In reply to Re^3: Shuffling CODONS by bliako
in thread Shuffling CODONS by WouterVG

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.