in reply to Re: Pick k numbers at random
in thread Pick k numbers at random

Shuffling an array and then picking the first k elements would do what I want, but that's still linear in n. The "sample" module seems to be efficient, though.

If you're worried about efficiency, you should measure first.

#!/usr/bin/env perl use warnings; use strict; use Math::Prime::Util qw/randperm/; use List::Util qw/shuffle/; use Algorithm::Numerical::Sample qw/sample/; use Benchmark qw/cmpthese/; die "Usage: $0 N K\n" unless @ARGV==2; my ($N,$K) = @ARGV; cmpthese(-2, { mpu => sub { my @range = randperm($N, $K); }, shuf => sub { my @range = (shuffle 0 .. $N-1)[0 .. $K - 1]; }, samp => sub { my @range = sample (-set => [0 .. $N-1], -sample_size => $K); }, }); __END__ $ perl bench.pl 10 5 Rate samp shuf mpu samp 326934/s -- -86% -91% shuf 2388541/s 631% -- -31% mpu 3456068/s 957% 45% -- $ perl bench.pl 100 5 Rate samp shuf mpu samp 76377/s -- -86% -98% shuf 539280/s 606% -- -86% mpu 3744910/s 4803% 594% -- $ perl bench.pl 10000 5 Rate samp shuf mpu samp 835/s -- -86% -100% shuf 5885/s 605% -- -100% mpu 3719610/s 445489% 63101% -- $ perl bench.pl 100000 5000 Rate samp shuf mpu samp 72.2/s -- -87% -99% shuf 562/s 678% -- -91% mpu 6544/s 8958% 1064% --

Not surprising, since Math::Prime::Util's randperm is implemented in C, and has a bunch of different methods for picking K of N depending on set sizes. See the source.

Replies are listed 'Best First'.
Re^3: Pick k numbers at random -- hash keys
by Discipulus (Canon) on Nov 12, 2019 at 20:48 UTC
    Hello haukex,

    just for fun a solution exploiting the randomness of hash keys.

    hkey => sub{ my @range = (keys %{+{map {($_ => 1)}0 .. $N-1}})[0 .. $K-1]; }

    It is slightly faster than samp for very small sets..

    perl randomshuffle.pl 10 5 Rate samp hkey shuf mpu samp 108393/s -- -13% -88% -92% hkey 124178/s 15% -- -87% -90% shuf 934174/s 762% 652% -- -27% mpu 1275273/s 1077% 927% 37% --
    ..but becomes fastly slower ;) for bigger ones.
    perl randomshuffle.pl 100 5 Rate hkey samp shuf mpu hkey 15844/s -- -46% -93% -99% samp 29546/s 86% -- -86% -98% shuf 212991/s 1244% 621% -- -86% mpu 1519656/s 9491% 5043% 613% --

    The only thing to note is the %{+{ LIST }} syntax, where + is used to disambiguate a hashref from a block (credit: perl IRC channel) because you can't dereference a map as a hash.

    PS: your hardware is ~3 times faster than mine ;)

    PPS: "randperm" is not exported by the Math::Prime::Util module in 0.60 so i upgraded to 0.73

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

      For small hashes, ordering isn't necessarily "random" from one hash to the next:

      c:\@Work\Perl\monks>perl -wMstrict -le "for (1 .. 8) { my @k = keys %{ { map { $_ => 1 } 0 .. 8 } }; print qq{@k}; } " 6 3 7 2 8 1 4 0 5 6 3 7 2 8 1 4 0 5 6 3 7 2 8 1 4 0 5 6 3 7 2 8 1 4 0 5 6 3 7 2 8 1 4 0 5 6 3 7 2 8 1 4 0 5 6 3 7 2 8 1 4 0 5 6 3 7 2 8 1 4 0 5
      Same results under ActiveState 5.8.9 and Strawberry 5.14.4.1.


      Give a man a fish:  <%-{-{-{-<

        For small hashes, ordering isn't necessarily "random" from one hash to the next

        I believe it's the version of perl that's important here.
        IIRC, with perl-5.16 and earlier the order was always repeated, but for later versions of perl the order varies at random (for some meaning of "at random").
        For example, with Strawberry Perl 5.18.2.1 your one liner produces for me:
        0 4 5 3 2 8 1 7 6 3 5 0 4 6 7 1 8 2 7 6 2 8 1 5 3 0 4 6 7 8 1 2 5 3 0 4 7 6 2 8 1 5 3 4 0 3 5 0 4 6 7 1 8 2 7 6 2 1 8 5 3 0 4 3 5 4 0 7 6 2 1 8
        and the output changes again when I re-run it.

        Interestingly, it's not strictly random - 0, 3, 4 and 5 always appear (in varying orders) either at the beginning or the end of the sequence.

        (I think it's still possible, via Configure args, to build perl with the pre-5.18 behaviour, though this is rarely done.)

        Cheers,
        Rob
Re^3: Pick k numbers at random
by ikegami (Patriarch) on Nov 14, 2019 at 18:19 UTC

    Not surprising, since Math::Prime::Util's randperm is implemented in C

    I think it's more about the fact that mpu creates $K scalars, while the other two creates at least $N.

    That's why the performance of shuf approaches that of mpu as $K approaches $N.