in reply to Re: Is this a fair shuffle?
in thread Is this a fair shuffle?

I guess I should have suspected that it had been done before. Thanks for the link. Good benchmark there. Incidentally, if you change the qshuf sub to be
sub qshuf { sort { .5 <=> rand(1) } sort { .5 <=> rand(1) } sort { .5 <=> rand(1) } sort { .5 <=> rand(1) } @_; }
The Std Dev drops to competitive levels, but the performance drops to last. Two or three calls to sort isn't sufficient. It was surprising to me that the time required for two chained sorts is so much more than for one.

Caution: Contents may have been coded under pressure.

Replies are listed 'Best First'.
Re^3: Is this a fair shuffle?
by BrowserUk (Patriarch) on Apr 01, 2005 at 04:56 UTC

    I often wondered how using a random "constant" would change things but I never got around to trying it. It would also be good to add List::Util::shuffle to the benchmark. I assume it would beat my perl implementation hands down for small arrays, but mine is probably still quicker for large arrays because it operates in place?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco.
    Rule 1 has a caveat! -- Who broke the cabal?
      I just looked again, and your shufl sorts things in-place. That's not fair. But it doesn't seem to make a lot of difference in the benchmark time. I also tried changing qshuf to not be inconsistent, like so:
      sub qshuf { my %hash; sort { $hash{$a}{$b}||=(.5 <=> rand 1) } @_; }
      That is, it caches the value for each pairwise comparison, but it still doesn't shuffle well. Of course, there is the possibility of a > b > c > a...

      Update: I have a winner!

      sub qshuf { my %hash; sort { ($hash{$a}||=rand 1) <=> ($hash{$b}||=rand 1) } @_; } Rate xform slice shufl qshuf xform 65.6/s -- -34% -72% -100% slice 99.7/s 52% -- -57% -100% shufl 233/s 255% 133% -- -99% qshuf 44494/s 67764% 44534% 19019% -- permutation | slice | xform | shufl | qshuf -------------------------------------------------- Std. Dev. | 69.974 | 58.895 | 53.532 | 62.689

      Caution: Contents may have been coded under pressure.
        sub qshuf { my %hash; sort { ($hash{$a}||=rand 1) <=> ($hash{$b}||=rand 1) } @_; }
        That's close, but try it on a list with repetitions:
        for (1 .. 10) { print qshuf(qw/a b c D D D D/), $/; }
        All the D's in the list get the same random value, so they all tie each other in the comparisons and end up adjacent. You are better off using the Schwartzian Transform I outlined above, where each array position gets a unique random number.

        blokhead

        That's not fair.

        I guess it depends upon what the aim is. If you simple want an array shuffled quicky, doing it in place is totally fair.

        But it doesn't seem to make a lot of difference in the benchmark time.

        As the arrays being shuffled only have 4 elements, the in-placeness makes almost no difference, but once you start shuffling arrays of a few hundred items, it has a fairly dramatic effect.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco.
        Rule 1 has a caveat! -- Who broke the cabal?
        And you really think numbers like "19019%" mean you have a brilliant algorithm? Just accept that, don't ask any questions?

        Think again. Next time, try to run your sort in a list context instead of a void context (when Perl will just optimize the entire sort away) and see what happens:

        #!/usr/bin/perl -w use strict; use Benchmark qw(cmpthese); use Data::Dumper; use List::Util qw /shuffle/; sub xform { map {$_->[0]} sort { $a->[1] <=> $b->[1]} map {[$_, rand(1)]} @_; } sub slice { my @random; push @random, rand 1 for 0 .. $#_; @_[ sort { $random[$a] <=> $random[$b] } 0 .. $#_ ]; } sub shufl { $a = $_ + rand @_ - $_ and @_[$_, $a] = @_[$a, $_] for (0..$#_); return @_; } sub qshuf { my %hash; sort { ($hash{$a}||=rand 1) <=> ($hash{$b}||=rand 1) } @_; } my @array = 1 .. 1000; cmpthese(-10, { slice => sub { () = slice @array }, xform => sub { () = xform @array }, shufl => sub { () = shufl @array }, qshuf => sub { () = qshuf @array }, lutil => sub { () = shuffle @array }, }); __END__ Rate qshuf xform slice shufl lutil qshuf 90.7/s -- -12% -36% -66% -97% xform 103/s 13% -- -27% -61% -96% slice 141/s 55% 37% -- -47% -95% shufl 264/s 191% 157% 87% -- -91% lutil 2870/s 3066% 2694% 1939% 988% --
        Not a winner. Not anywhere near the winner either.

      It would also be good to add List::Util::shuffle to the benchmark. I assume it would beat my perl implementation hands down for small arrays, but mine is probably still quicker for large arrays because it operates in place?

      If you are referring to the Perl implementation of List::Util::shuffle, then yes a Perl implementation of FY (like yours) wins, but naturally the C implementation of List::Util::shuffle (which is the default) handily beats a Perl implementation of FY (lup = List::Util::shuffle in Perl; luc = List::Util::shuffle in C; fyp = Fisher-Yates in Perl):

      Rate lup fyp luc lup 1.64/s -- -12% -83% fyp 1.87/s 14% -- -80% luc 9.43/s 476% 406% --
      That's for sorting an array of 105 elements.

      the lowliest monk