Hello.
I'd like to share my observation how to shuffle an array with Perl's sort. This shuffle is far from perfect, and the result of
shuffle isn't uniform. But may it be useful for someone.
!!! UPD. Using sort subroutine in that way is NOT RECOMMENDED! See further comments. ikegami and haukex provided explanation and cited documentation. !!!
Code:
@array = sort { 0.5 <=> rand } @array;
Perl's default sort algorithm is
mergesort (perlsec:
Algorithmic Complexity Attacks) from 5.8.0 version
(Although there may be some optimizations or different behavior if an array length is less than 18? e.g. RFC: Simple switches for 'sort' and list 'reverse').
How does it work? Sort subroutine ignores values of both elements of a pair to compare, i.e. we don't use bindings: nor
$a neither
$b. Just a constant and
rand() function. Here
rand() produces some number from an interval 0 to 1 and compares it to a constant (i.e. 0.5). And the "spaceship" operator returns either -1, either 1 (hence with 0.5 probability). Then
sort, depending on subroutine's value, changes positions of elements of a pair.
How the
results obtained with this sort subroutine
differ from uniformly shuffled array, e.g. Fisher-Yates shuffle?
Here is code which I used to produce distribution of occurrences. I.e. how many times the first element of the original array jumps to another places (i.e. indexes). The output is kinda pseudo-graphical histogram. In case of uniform distribution all values come in vertical columns. But in case of worse shuffle algorithm, columns are produced with slopes.
#!/usr/bin/perl
use warnings;
use strict;
$\ = $/;
srand 1;
# https://perlmonks.org/?node_id=1905
# randomly permutate @array in place
my $fisher_yates_shuffle = sub
{
my $array = shift;
my $i = @$array;
while ( --$i )
{
my $j = int rand( $i+1 );
@$array[$i,$j] = @$array[$j,$i];
}
};
my $pseudoshuffle_with_sort = sub {
my( $ref_array ) = shift;
@{ $ref_array } = sort { 0.5 <=> rand } @{ $ref_array };
};
my $size_of_array = 9;
my $A = 'A';
my @array = map { $A ++ } 1 .. $size_of_array;
my $times = 10000;
my @distribution;
@distribution = &distribution_of_occurrences( \@array,
$pseudoshuffle_with_sort, $times );
&pretty_squeeze( \@distribution, \@array );
print for '1)', @distribution;
@distribution = &distribution_of_occurrences( \@array,
$fisher_yates_shuffle, $times );
&pretty_squeeze( \@distribution, \@array );
print for '2)', @distribution;
sub distribution_of_occurrences{
my( $ref_array, $shuffle_sub, $times ) = @_;
my @distribution;
for my $i ( 1 .. $times ){
my @copy_of_array = @{ $ref_array };
$shuffle_sub->( \@copy_of_array );
my $j = 0;
map { $distribution[ $j ++ ] .= " " . $_ } @copy_of_array;
}
return @distribution = map {
join ' ', sort { length $a <=> length $b || $a cmp $b } split
} @distribution;
}
sub pretty_squeeze{
my( $ref_distribution, $ref_array ) = @_;
my $times_approx = split ' ', $ref_distribution->[ 0 ];
my $del_times = int $times_approx / @{ $ref_array } / 4;
s/(\b\w+\b) \K (?:[ ]\1){0,$del_times}//xg for @{ $ref_distributio
+n };
my $k = -1;
my $rx_every_second = join '|', grep $k ++ % 2 == 0, @{ $ref_array
+ };
s/\b(?:$rx_every_second)\b/./g for @{ $ref_distribution };
}
OUTPUT:
1)
A A A . . . C C C . . . E E E E E . . . . . G G G G G . . . . . I I I
+I I I I I I I
A A A . . . C C C . . . E E E E E . . . . . G G G G G G . . . . . . I
+I I I I I I
A A A . . . C C C . . . E E E E E . . . . . G G G G G G G . . . . . .
+. I I I I I I
A A A . . . C C C C . . . . E E E E E . . . . . G G G G G G . . . . .
+. I I I I I
A A A A . . . . C C C C . . . . E E E E E . . . . . G G G G G . . . .
+. I I I I
A A A A A . . . . . C C C C C . . . . . E E E E E . . . . . G G G G .
+. . . I I I
A A A A A A . . . . . . C C C C C C . . . . . . E E E E . . . . G G G
+. . . I I
A A A A A A A . . . . . . . C C C C C C C . . . . . . . E E E . . . G
+G G . . . I I
A A A A A A A . . . . . . . C C C C C C C . . . . . . . E E E . . . G
+G . . I I
2)
A A A A . . . . C C C C . . . . . E E E E . . . . . G G G G G . . . .
+I I I I I
A A A A . . . . C C C C C . . . . . E E E E E . . . . G G G G G . . .
+. . I I I I
A A A A A . . . . . C C C C . . . . . E E E E . . . . G G G G . . . .
+. I I I I
A A A A A . . . . C C C C C . . . . E E E E E . . . . G G G G . . . .
+I I I I I
A A A A A . . . . . C C C C C . . . . E E E E E . . . . G G G G . . .
+. . I I I I I
A A A A A . . . . C C C C . . . . . E E E E E . . . . . G G G G . . .
+. I I I I I
A A A A . . . . C C C C . . . . . E E E E . . . . . G G G G . . . . .
+I I I I
A A A A . . . . . C C C C . . . . . E E E E . . . . G G G G G . . . .
+I I I I
A A A A . . . . . C C C C . . . . E E E E . . . . . G G G G G . . . .
+. I I I I
One can vary a value of variable
$size_of_array to see how columns change with different array sizes. In the output example (with
$size_of_array = 9) we can see non-vertical columns in '1)' (pseudo-shuffle with sort) and vertical columns in '2)' (Fisher-Yates shuffle). This data means that the first element of an array more often jumped to the end of an array, and the last element jumped more often to the beginning of an array.
Upd.: P.S. Note time complexity difference: mergesort is
O(N log N), Fisher-Yates shuffle is
O(N).