http://qs1969.pair.com?node_id=11129653

Hello.

I'd like to share my observation how to shuffle an array with Perl's sort. This shuffle is far from perfect, and the result of shuffle isn't uniform. But may it be useful for someone.

!!! UPD. Using sort subroutine in that way is NOT RECOMMENDED! See further comments. ikegami and haukex provided explanation and cited documentation. !!!

Code:
@array = sort { 0.5 <=> rand } @array;
Perl's default sort algorithm is mergesort (perlsec: Algorithmic Complexity Attacks) from 5.8.0 version (Although there may be some optimizations or different behavior if an array length is less than 18? e.g. RFC: Simple switches for 'sort' and list 'reverse').
How does it work? Sort subroutine ignores values of both elements of a pair to compare, i.e. we don't use bindings: nor $a neither $b. Just a constant and rand() function. Here rand() produces some number from an interval 0 to 1 and compares it to a constant (i.e. 0.5). And the "spaceship" operator returns either -1, either 1 (hence with 0.5 probability). Then sort, depending on subroutine's value, changes positions of elements of a pair.

How the results obtained with this sort subroutine differ from uniformly shuffled array, e.g. Fisher-Yates shuffle?

Here is code which I used to produce distribution of occurrences. I.e. how many times the first element of the original array jumps to another places (i.e. indexes). The output is kinda pseudo-graphical histogram. In case of uniform distribution all values come in vertical columns. But in case of worse shuffle algorithm, columns are produced with slopes.
#!/usr/bin/perl use warnings; use strict; $\ = $/; srand 1; # https://perlmonks.org/?node_id=1905 # randomly permutate @array in place my $fisher_yates_shuffle = sub { my $array = shift; my $i = @$array; while ( --$i ) { my $j = int rand( $i+1 ); @$array[$i,$j] = @$array[$j,$i]; } }; my $pseudoshuffle_with_sort = sub { my( $ref_array ) = shift; @{ $ref_array } = sort { 0.5 <=> rand } @{ $ref_array }; }; my $size_of_array = 9; my $A = 'A'; my @array = map { $A ++ } 1 .. $size_of_array; my $times = 10000; my @distribution; @distribution = &distribution_of_occurrences( \@array, $pseudoshuffle_with_sort, $times ); &pretty_squeeze( \@distribution, \@array ); print for '1)', @distribution; @distribution = &distribution_of_occurrences( \@array, $fisher_yates_shuffle, $times ); &pretty_squeeze( \@distribution, \@array ); print for '2)', @distribution; sub distribution_of_occurrences{ my( $ref_array, $shuffle_sub, $times ) = @_; my @distribution; for my $i ( 1 .. $times ){ my @copy_of_array = @{ $ref_array }; $shuffle_sub->( \@copy_of_array ); my $j = 0; map { $distribution[ $j ++ ] .= " " . $_ } @copy_of_array; } return @distribution = map { join ' ', sort { length $a <=> length $b || $a cmp $b } split } @distribution; } sub pretty_squeeze{ my( $ref_distribution, $ref_array ) = @_; my $times_approx = split ' ', $ref_distribution->[ 0 ]; my $del_times = int $times_approx / @{ $ref_array } / 4; s/(\b\w+\b) \K (?:[ ]\1){0,$del_times}//xg for @{ $ref_distributio +n }; my $k = -1; my $rx_every_second = join '|', grep $k ++ % 2 == 0, @{ $ref_array + }; s/\b(?:$rx_every_second)\b/./g for @{ $ref_distribution }; }
OUTPUT:
1) A A A . . . C C C . . . E E E E E . . . . . G G G G G . . . . . I I I +I I I I I I I A A A . . . C C C . . . E E E E E . . . . . G G G G G G . . . . . . I +I I I I I I A A A . . . C C C . . . E E E E E . . . . . G G G G G G G . . . . . . +. I I I I I I A A A . . . C C C C . . . . E E E E E . . . . . G G G G G G . . . . . +. I I I I I A A A A . . . . C C C C . . . . E E E E E . . . . . G G G G G . . . . +. I I I I A A A A A . . . . . C C C C C . . . . . E E E E E . . . . . G G G G . +. . . I I I A A A A A A . . . . . . C C C C C C . . . . . . E E E E . . . . G G G +. . . I I A A A A A A A . . . . . . . C C C C C C C . . . . . . . E E E . . . G +G G . . . I I A A A A A A A . . . . . . . C C C C C C C . . . . . . . E E E . . . G +G . . I I 2) A A A A . . . . C C C C . . . . . E E E E . . . . . G G G G G . . . . +I I I I I A A A A . . . . C C C C C . . . . . E E E E E . . . . G G G G G . . . +. . I I I I A A A A A . . . . . C C C C . . . . . E E E E . . . . G G G G . . . . +. I I I I A A A A A . . . . C C C C C . . . . E E E E E . . . . G G G G . . . . +I I I I I A A A A A . . . . . C C C C C . . . . E E E E E . . . . G G G G . . . +. . I I I I I A A A A A . . . . C C C C . . . . . E E E E E . . . . . G G G G . . . +. I I I I I A A A A . . . . C C C C . . . . . E E E E . . . . . G G G G . . . . . +I I I I A A A A . . . . . C C C C . . . . . E E E E . . . . G G G G G . . . . +I I I I A A A A . . . . . C C C C . . . . E E E E . . . . . G G G G G . . . . +. I I I I
One can vary a value of variable $size_of_array to see how columns change with different array sizes. In the output example (with $size_of_array = 9) we can see non-vertical columns in '1)' (pseudo-shuffle with sort) and vertical columns in '2)' (Fisher-Yates shuffle). This data means that the first element of an array more often jumped to the end of an array, and the last element jumped more often to the beginning of an array.

Upd.: P.S. Note time complexity difference: mergesort is O(N log N), Fisher-Yates shuffle is O(N).

Replies are listed 'Best First'.
Re: RFC: Kinda pseudo-shuffle using sort
by ikegami (Patriarch) on Mar 15, 2021 at 10:22 UTC

    Why?


    From sort,

    The comparison function is required to behave. If it returns inconsistent results (sometimes saying $x1 is less than $x2 and sometimes saying the opposite, for example) the results are not well-defined.

    It doesn't say undefined behaviour, so I guess you're in the clear as far as not crashing is concerned?

    That said, the "quality of the randonmess" can change at any time. sort's algorithm has been changed before, and could change again.

    Seeking work! You can reach me at ikegami@adaelis.com

Re: RFC: Kinda pseudo-shuffle using sort
by haukex (Archbishop) on Mar 15, 2021 at 10:23 UTC
    @array = sort { 0.5 <=> rand } @array;

    I wouldn't recommend this. From sort:

    The comparison function is required to behave. If it returns inconsistent results (sometimes saying $x[1] is less than $x[2] and sometimes saying the opposite, for example) the results are not well-defined.

    One could argue about whether "well-defined" simply means that the sort order isn't well-defined, or whether it means that something might blow up, but "The comparison function is required to behave." is a pretty clear statement.

    Update: I see ikegami beat me to it.

Re: RFC: Kinda pseudo-shuffle using sort
by bliako (Monsignor) on Mar 16, 2021 at 08:50 UTC
    Upd.: P.S. Note time complexity difference: mergesort is O(N log N), Fisher-Yates shuffle is O(N).

    I would consider this as a major deterrent. As well as the point made by the other answers.

    Since you touched the subject of unorthodox shuffles, perhaps you also want to consider shuffling by converting the array into a hash and back again. For example:

    my $pseudoshuffle_with_hash = sub { my( $ref_array ) = shift; @{ $ref_array } = keys %{ {map { $_ => undef } @{ $ref_array }} }; };

    bw, bliako

      Nice alternative!

      Though it can be used with simple arrays only (containing text and numbers, not references).

      I did the same "histogram" here with '$pseudoshuffle_with_hash'. And I'm getting different results every time, because 'srand(CONST)' doesn't influence hash. But outputs are similar. Kinda symmetric. I paste here two runs:

      3) A A A . . . C C C C C . . . . . E E E . . . G G G G G . . . . . . . . +. . I I I I I A A A A A . . . C C C C C C C C C C E E E . . . . . G G G G G I I I I +I I I I I A A A . . . . . C C C C C . . . . . E E E E E . . . . . . . . . . . . +I I I I I A A A A A . . . . . . . . . . . . . . E E E E E . . . . . G G G G G G +G G G A A A A A A A A A A . . . . . E E E E E . . . . . . . . . G G G G G G +G G G G A A A A A . . . . . . . . . . . . . . . . . . E E E E E E E E E E . . +. . . A A A . . . C C C C C . . . . . E E E . . . G G G G G . . . . . . . . +. I I I I I A A A A A . . . C C C C C C C C C C E E E . . . . . G G G G G I I I I +I I I I I A A A . . . . . C C C C C . . . . . E E E E E . . . . . . . . . . . . +. I I I I I
      3) A A A A A . . . C C C . . . E E E E E . . . G G G G G . . . . . I I I +I I I I I I I A A A A A . . . . . . . C C C C C C C C . . . . . E E E E E . . . . . +. . . . . . . . . . . . C C C C C C C . . . . . . . . . . G G G G G G G G G . . +. . . A A A A A . . . C C C . . . . . E E E E E . . . . . . . . . . I I I I +I I I I I I A A A A A A A A A A . . . . . E E E E E E E E E E . . . . . G G G G G +G G G G A A A A A . . . C C C . . . . . E E E E E . . . . . . . . . . I I I I +I I I I I . . . . . . . C C C C C C C . . . . . . . . . . G G G G G G G G G G . +. . . . A A A A A . . . . . . . C C C C C C C . . . . . E E E E E . . . . . . +. . . . A A A A A . . . C C C . . . E E E E E . . . G G G G G . . . . . I I I +I I I I I I