comment on

Hello.

I'd like to share my observation how to shuffle an array with Perl's sort. This shuffle is far from perfect, and the result of shuffle isn't uniform. But may it be useful for someone.

!!! UPD. Using sort subroutine in that way is NOT RECOMMENDED! See further comments. ikegami and haukex provided explanation and cited documentation. !!!

Code:

@array = sort { 0.5 <=> rand } @array;
[download]

Perl's default sort algorithm is mergesort (perlsec: Algorithmic Complexity Attacks) from 5.8.0 version (Although there may be some optimizations or different behavior if an array length is less than 18? e.g. RFC: Simple switches for 'sort' and list 'reverse').
How does it work? Sort subroutine ignores values of both elements of a pair to compare, i.e. we don't use bindings: nor $a neither $b. Just a constant and rand() function. Here rand() produces some number from an interval 0 to 1 and compares it to a constant (i.e. 0.5). And the "spaceship" operator returns either -1, either 1 (hence with 0.5 probability). Then sort, depending on subroutine's value, changes positions of elements of a pair.

How the results obtained with this sort subroutine differ from uniformly shuffled array, e.g. Fisher-Yates shuffle?

Here is code which I used to produce distribution of occurrences. I.e. how many times the first element of the original array jumps to another places (i.e. indexes). The output is kinda pseudo-graphical histogram. In case of uniform distribution all values come in vertical columns. But in case of worse shuffle algorithm, columns are produced with slopes.

#!/usr/bin/perl

use warnings;
use strict;

$\ = $/;

srand 1;

# https://perlmonks.org/?node_id=1905
# randomly permutate @array in place
my $fisher_yates_shuffle = sub
{
    my $array = shift;
    my $i = @$array;
    while ( --$i )
    {
        my $j = int rand( $i+1 );
        @$array[$i,$j] = @$array[$j,$i];
    }
};

my $pseudoshuffle_with_sort = sub {
    my( $ref_array ) = shift;
    
    @{ $ref_array } = sort { 0.5 <=> rand } @{ $ref_array };
    };


my $size_of_array = 9;

my $A = 'A';
my @array = map { $A ++ } 1 .. $size_of_array;

my $times = 10000;

my @distribution;

@distribution = &distribution_of_occurrences( \@array, 
    $pseudoshuffle_with_sort, $times );

&pretty_squeeze( \@distribution, \@array );
print for '1)', @distribution;

@distribution = &distribution_of_occurrences( \@array, 
    $fisher_yates_shuffle, $times );

&pretty_squeeze( \@distribution, \@array );
print for '2)', @distribution;


sub distribution_of_occurrences{
    my( $ref_array, $shuffle_sub, $times ) = @_;
    
    my @distribution;
    
    for my $i ( 1 .. $times ){
        
        my @copy_of_array = @{ $ref_array };
        $shuffle_sub->( \@copy_of_array );
        
        my $j = 0;
        map { $distribution[ $j ++ ] .= " " . $_ } @copy_of_array;
        }

    return @distribution = map { 
        join ' ', sort { length $a <=> length $b || $a cmp $b } split 
        } @distribution;
}

sub pretty_squeeze{
    my( $ref_distribution, $ref_array ) = @_;
    
    my $times_approx = split ' ', $ref_distribution->[ 0 ];
    my $del_times = int $times_approx / @{ $ref_array } / 4;

    s/(\b\w+\b) \K (?:[ ]\1){0,$del_times}//xg for @{ $ref_distributio
+n };
    
    my $k = -1;
    my $rx_every_second = join '|', grep $k ++ % 2 == 0, @{ $ref_array
+ };
    s/\b(?:$rx_every_second)\b/./g for @{ $ref_distribution };
    }
[download]

OUTPUT:

1)
A A A . . . C C C . . . E E E E E . . . . . G G G G G . . . . . I I I 
+I I I I I I I
A A A . . . C C C . . . E E E E E . . . . . G G G G G G . . . . . . I 
+I I I I I I
A A A . . . C C C . . . E E E E E . . . . . G G G G G G G . . . . . . 
+. I I I I I I
A A A . . . C C C C . . . . E E E E E . . . . . G G G G G G . . . . . 
+. I I I I I
A A A A . . . . C C C C . . . . E E E E E . . . . . G G G G G . . . . 
+. I I I I
A A A A A . . . . . C C C C C . . . . . E E E E E . . . . . G G G G . 
+. . . I I I
A A A A A A . . . . . . C C C C C C . . . . . . E E E E . . . . G G G 
+. . . I I
A A A A A A A . . . . . . . C C C C C C C . . . . . . . E E E . . . G 
+G G . . . I I
A A A A A A A . . . . . . . C C C C C C C . . . . . . . E E E . . . G 
+G . . I I
2)
A A A A . . . . C C C C . . . . . E E E E . . . . . G G G G G . . . . 
+I I I I I
A A A A . . . . C C C C C . . . . . E E E E E . . . . G G G G G . . . 
+. . I I I I
A A A A A . . . . . C C C C . . . . . E E E E . . . . G G G G . . . . 
+. I I I I
A A A A A . . . . C C C C C . . . . E E E E E . . . . G G G G . . . . 
+I I I I I
A A A A A . . . . . C C C C C . . . . E E E E E . . . . G G G G . . . 
+. . I I I I I
A A A A A . . . . C C C C . . . . . E E E E E . . . . . G G G G . . . 
+. I I I I I
A A A A . . . . C C C C . . . . . E E E E . . . . . G G G G . . . . . 
+I I I I
A A A A . . . . . C C C C . . . . . E E E E . . . . G G G G G . . . . 
+I I I I
A A A A . . . . . C C C C . . . . E E E E . . . . . G G G G G . . . . 
+. I I I I
[download]

One can vary a value of variable $size_of_array to see how columns change with different array sizes. In the output example (with $size_of_array = 9) we can see non-vertical columns in '1)' (pseudo-shuffle with sort) and vertical columns in '2)' (Fisher-Yates shuffle). This data means that the first element of an array more often jumped to the end of an array, and the last element jumped more often to the beginning of an array.

Upd.: P.S. Note time complexity difference: mergesort is O(N log N), Fisher-Yates shuffle is O(N).

In reply to RFC: Kinda pseudo-shuffle using sort by rsFalse

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.