Apart from time benchmarks, the suitability of the shuffle algorithm must be assessed with respect to the quality of the randomness of the shuffled array. One way to do this is to calculate the auto-correlation of the shuffled sequence with lag 1 (looking at consecutive elements). The absolute value of the a-c coefficient approaches 1 when the sequence is highly auto-correlated (for example the test array 1..1000) and zero when the opposite happens. So, a good quality shuffle should produce auto-correlations approaching zero.
Edit: suggested test scenario: start with a highly correlated array (e.g 1..1000: perl -MStatistics::Autocorrelation -e 'print Statistics::Autocorrelation->new()->coefficient(data=>[1..1000],lag=>1)."\n"' yields 0.997) and see how the shuffling algorithm de-auto-correlates it by lowering its auto-correlation coefficient towards zero.
Edit 2: auto-correlation coefficient is in the range -1 to 1. Both extremes are for higlhy auto-correlated sequences and zero for no auto-correlation. In this test I take the absolute value of the coefficient.
The following script compares the three methods mentioned here by BrowserUK, tybalt89, List::Util/shuffle with respect to auto-correlation and also, for each trial it plots a histogram of the differences between consecutive elements of the shuffled array, just for fun.
The best shuffle is the one who produces the lowest mean auto-correlation with lowest variance and most successes (i.e. it had the minimum auto-correlation at a specific trial).
./fisher_yates.pl : after 5000 trials shuffling arrays of size 1000: List::Util::shuffle : 1693 successes, mean:0.0105896962736892, stdev:0 +.00900688731621982 BUK : 1685 successes, mean:0.010799062825769, stdev:0.0092140346941260 +4 tybalt89 : 1622 successes, mean:0.0102906705829024, stdev:0.0084376063 +2828801
once more:
./fisher_yates.pl : after 5000 trials shuffling arrays of size 1000: BUK : 1696 successes, mean:0.0104235933728858, stdev:0.008974970557612 +36 List::Util::shuffle : 1690 successes, mean:0.0106133000677379, stdev:0 +.00908235156157047 tybalt89 : 1614 successes, mean:0.0100835174626996, stdev:0.0089795531 +9759652
once more:
./fisher_yates.pl : after 5000 trials shuffling arrays of size 1000: List::Util::shuffle : 1690 successes, mean:0.0104611128054915, stdev:0 +.00886345338184372 BUK : 1658 successes, mean:0.0102429744950854, stdev:0.008480381381372 +49 tybalt89 : 1652 successes, mean:0.0105683142305418, stdev:0.0089906156 +3593633
My opinion: all algorithms work well with respect to randomness (as assessed by auto-correlation) and now we can move to time benchmarks.
TODO: try with a different random number generator (i.e. more reliably uniform).
The test program:#!/usr/bin/env perl use strict; use warnings; use Statistics::Histogram; use Statistics::Autocorrelation; use Statistics::Descriptive; use List::Util qw/shuffle/; my $N = 1000; my $trials = 50; my %mins = (); for(1..$trials){ my $res = assess_once(); if( exists $mins{$res->[0]} ){ push(@{$mins{$res->[0]}}, $res->[1]->[1]); } else { $mins{$res->[0]} = [$res->[1]->[1]]; } } print "$0 : after $trials trials shuffling arrays of size $N:\n"; foreach (keys %mins){ my @re = @{$mins{$_}}; my $stats = Statistics::Descriptive::Full->new(); $stats->add_data(@re); print $_." : ".scalar(@re)." successes, mean:".$stats->mean(). +", stdev:".$stats->standard_deviation()."\n"; } sub assess_once { my %results = (); my @array = 1..$N; shuffleAry_1( \@array ); $results{'BUK'} = [ histo_of_differences(\@array), corello_abs(\@array) ]; @array = List::Util::shuffle(1..$N); $results{'List::Util::shuffle'} = [ histo_of_differences(\@array), corello_abs(\@array) ]; @array = 1..$N; @array = @{shuffleAry_2( \@array )}; $results{'tybalt89'} = [ histo_of_differences(\@array), corello_abs(\@array) ]; my @keys_sorted_autocor_desc = sort { $results{$a}->[1] <=> $results{$b}->[1] } keys +%results; foreach (@keys_sorted_autocor_desc){ my $hist = $results{$_}->[0]; my $autocor = $results{$_}->[1]; print $_.") Autocorrelation coefficient: ".$autocor."\ +n"; print $_.") Histogram of the differences of consecutiv +e elements:\n".$hist."\n"; print "--------------------------\n\n\n"; } foreach (@keys_sorted_autocor_desc){ my $hist = $results{$_}->[0]; my $autocor = $results{$_}->[1]; print $_.") Autocorrelation coefficient: ".$autocor."\ +n"; } print "assess() : minimum autocorrelation coeff is " .$results{$keys_sorted_autocor_desc[0]}->[1] ." for ".$keys_sorted_autocor_desc[0] ."\n"; print "assess() : done\n"; return [$keys_sorted_autocor_desc[0], $results{$keys_sorted_au +tocor_desc[0]}] } exit(0); sub shuffleAry_2 { my $arr = $_[0]; return [ map $_->[0], sort { $a->[1] <=> $b->[1] } map [ $_, rand ], @{$arr} ] } sub shuffleAry_1 { die 'Need array reference' unless ref( $_[0] ) eq 'ARRAY'; our( @aliased, $a, $b ); local( *aliased, $a, $b ) = $_[0]; $a = $_ + rand @aliased - $_, $b = $aliased[ $_ ], $aliased[ $_ ] = $aliased[ $a ], $aliased[ $a ] = $b for 0 .. $#aliased; return; } sub corello_abs { my $arr = $_[0]; my $acorr = Statistics::Autocorrelation->new(); return abs( $acorr->coefficient( data => $arr, lag=>1 ) ) } sub histo_of_differences { my $arr = $_[0]; my $N = $#$arr; my @diffs = (0)x($N); for(1..$N){ $diffs[$_-1] = abs($arr->[$_] - $arr->[$_-1]); } return Statistics::Histogram::get_histogram(\@diffs); }
In reply to Re: Shuffling CODONS
by bliako
in thread Shuffling CODONS
by WouterVG
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |