Chuma has asked for the wisdom of the Perl Monks concerning the following question:

I have a fairly theoretical question today. I'd like to randomly pick k integers less than n. The numbers should be distinct, and all sets of numbers equally likely. What's an efficient way to do that?

I could for example make an array of the numbers less than n, choose a number at random, and then splice it from the array. But that's hardly the optimal way, since it would take O(n) time to make the array that I don't really need. Another way would be to pick a random number, check if it's been picked before, and if so try again. That would be good enough if k << n, but generally it's not great. Yet another option would be that if the picked number has been picked before, you just try the next number after; that would have a better worst-case time, but it would make some sets more likely than others.

Replies are listed 'Best First'.
Re: Pick k numbers at random
by Athanasius (Archbishop) on Nov 12, 2019 at 13:06 UTC

    Hello Chuma,

    The core module List::Util has a shuffle function which does what you want:

    use strict; use warnings; use Const::Fast; use Data::Dump; use List::Util qw( shuffle ); const my $N => 10; const my $K => 5; my @range = (shuffle 1 .. $N)[0 .. $K - 1]; dd \@range;

    Sample output:

    23:04 >perl 2029_SoPW.pl [10, 9, 5, 6, 7] 23:04 >

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: Pick k numbers at random
by haukex (Archbishop) on Nov 12, 2019 at 13:13 UTC

    A quick Google search shows that this is a pretty well-covered topic, see e.g. Reservoir sampling. On CPAN I see Algorithm::Numerical::Sample, plus maybe Math::Prime::Util's randperm, which is essentially the same as Athanasius's example (although I suspect it might be a bit more efficient).

    Update: Now with Benchmark:

    use warnings; use strict; use Math::Prime::Util qw/randperm/; use List::Util qw/shuffle/; use Benchmark qw/cmpthese/; my $N = 10; my $K = 5; cmpthese(-1, { mpu => sub { my @range = randperm($N, $K); }, shuf => sub { my @range = (shuffle 0 .. $N-1)[0 .. $K - 1]; }, }); __END__ Rate shuf mpu shuf 2496610/s -- -31% mpu 3598054/s 44% --
Re: Pick k numbers at random
by Chuma (Scribe) on Nov 12, 2019 at 14:09 UTC

    Thank you for your replies!

    Shuffling an array and then picking the first k elements would do what I want, but that's still linear in n. The "sample" module seems to be efficient, though.

      Shuffling an array and then picking the first k elements would do what I want, but that's still linear in n. The "sample" module seems to be efficient, though.

      If you're worried about efficiency, you should measure first.

      #!/usr/bin/env perl use warnings; use strict; use Math::Prime::Util qw/randperm/; use List::Util qw/shuffle/; use Algorithm::Numerical::Sample qw/sample/; use Benchmark qw/cmpthese/; die "Usage: $0 N K\n" unless @ARGV==2; my ($N,$K) = @ARGV; cmpthese(-2, { mpu => sub { my @range = randperm($N, $K); }, shuf => sub { my @range = (shuffle 0 .. $N-1)[0 .. $K - 1]; }, samp => sub { my @range = sample (-set => [0 .. $N-1], -sample_size => $K); }, }); __END__ $ perl bench.pl 10 5 Rate samp shuf mpu samp 326934/s -- -86% -91% shuf 2388541/s 631% -- -31% mpu 3456068/s 957% 45% -- $ perl bench.pl 100 5 Rate samp shuf mpu samp 76377/s -- -86% -98% shuf 539280/s 606% -- -86% mpu 3744910/s 4803% 594% -- $ perl bench.pl 10000 5 Rate samp shuf mpu samp 835/s -- -86% -100% shuf 5885/s 605% -- -100% mpu 3719610/s 445489% 63101% -- $ perl bench.pl 100000 5000 Rate samp shuf mpu samp 72.2/s -- -87% -99% shuf 562/s 678% -- -91% mpu 6544/s 8958% 1064% --

      Not surprising, since Math::Prime::Util's randperm is implemented in C, and has a bunch of different methods for picking K of N depending on set sizes. See the source.

        Hello haukex,

        just for fun a solution exploiting the randomness of hash keys.

        hkey => sub{ my @range = (keys %{+{map {($_ => 1)}0 .. $N-1}})[0 .. $K-1]; }

        It is slightly faster than samp for very small sets..

        perl randomshuffle.pl 10 5 Rate samp hkey shuf mpu samp 108393/s -- -13% -88% -92% hkey 124178/s 15% -- -87% -90% shuf 934174/s 762% 652% -- -27% mpu 1275273/s 1077% 927% 37% --
        ..but becomes fastly slower ;) for bigger ones.
        perl randomshuffle.pl 100 5 Rate hkey samp shuf mpu hkey 15844/s -- -46% -93% -99% samp 29546/s 86% -- -86% -98% shuf 212991/s 1244% 621% -- -86% mpu 1519656/s 9491% 5043% 613% --

        The only thing to note is the %{+{ LIST }} syntax, where + is used to disambiguate a hashref from a block (credit: perl IRC channel) because you can't dereference a map as a hash.

        PS: your hardware is ~3 times faster than mine ;)

        PPS: "randperm" is not exported by the Math::Prime::Util module in 0.60 so i upgraded to 0.73

        L*

        There are no rules, there are no thumbs..
        Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

        Not surprising, since Math::Prime::Util's randperm is implemented in C

        I think it's more about the fact that mpu creates $K scalars, while the other two creates at least $N.

        That's why the performance of shuf approaches that of mpu as $K approaches $N.

      If you're dealing with a massive N (like billions that can't be allocated effectively) and small K, here's a quick algorithm:
      my %remap; my @result; while ($K--) { my $x= int rand $N--; push @result, $remap{$x} // $x; $remap{$x}= $N; }
Re: Pick k numbers at random
by Anonymous Monk on Nov 13, 2019 at 00:13 UTC
    A fairly-stupid but actually very-effective recursive algorithm that I once encountered basically consisted of the following:
    function populate(lo,hi, depth) { if (lo >= hi) return; if (depth > MAX_DEPTH) return; midpoint = rand(lo,hi); add_to_result(midpoint); populate(lo, midpoint-1, depth+1); populate(midpoint+1, hi, depth+1); }

      You didn't understand the OP's concerns, you regurgitated a CS 102 solution that you "encountered", and you posted pseudocode instead of Perl.

      Is anyone surprised? I'm not.