Re^2: Pick k numbers at random

Shuffling an array and then picking the first k elements would do what I want, but that's still linear in n. The "sample" module seems to be efficient, though.

If you're worried about efficiency, you should measure first.

#!/usr/bin/env perl
use warnings;
use strict;
use Math::Prime::Util qw/randperm/;
use List::Util qw/shuffle/;
use Algorithm::Numerical::Sample qw/sample/;
use Benchmark qw/cmpthese/;

die "Usage: $0 N K\n" unless @ARGV==2;
my ($N,$K) = @ARGV;

cmpthese(-2, {
    mpu => sub {
        my @range = randperm($N, $K);
    },
    shuf => sub {
        my @range = (shuffle 0 .. $N-1)[0 .. $K - 1];
    },
    samp => sub {
        my @range = sample (-set => [0 .. $N-1], -sample_size => $K);
    },
});
__END__

$ perl bench.pl 10 5
          Rate samp shuf  mpu
samp  326934/s   -- -86% -91%
shuf 2388541/s 631%   -- -31%
mpu  3456068/s 957%  45%   --

$ perl bench.pl 100 5
          Rate  samp  shuf   mpu
samp   76377/s    --  -86%  -98%
shuf  539280/s  606%    --  -86%
mpu  3744910/s 4803%  594%    --

$ perl bench.pl 10000 5
          Rate    samp    shuf     mpu
samp     835/s      --    -86%   -100%
shuf    5885/s    605%      --   -100%
mpu  3719610/s 445489%  63101%      --

$ perl bench.pl 100000 5000
       Rate  samp  shuf   mpu
samp 72.2/s    --  -87%  -99%
shuf  562/s  678%    --  -91%
mpu  6544/s 8958% 1064%    --
[download]

Not surprising, since Math::Prime::Util's randperm is implemented in C, and has a bunch of different methods for picking K of N depending on set sizes. See the source.

Comment on Re^2: Pick k numbers at random Select or Download Code

Replies are listed 'Best First'.
Re^3: Pick k numbers at random -- hash keys by Discipulus (Canon) on Nov 12, 2019 at 20:48 UTC
Hello haukex, just for fun a solution exploiting the randomness of hash keys. `hkey => sub{ my @range = (keys %{+{map {($_ => 1)}0 .. $N-1}})[0 .. $K-1]; }` [download] It is slightly faster than `samp` for very small sets.. `perl randomshuffle.pl 10 5 Rate samp hkey shuf mpu samp 108393/s -- -13% -88% -92% hkey 124178/s 15% -- -87% -90% shuf 934174/s 762% 652% -- -27% mpu 1275273/s 1077% 927% 37% --` [download] ..but becomes fastly slower ;) for bigger ones. `perl randomshuffle.pl 100 5 Rate hkey samp shuf mpu hkey 15844/s -- -46% -93% -99% samp 29546/s 86% -- -86% -98% shuf 212991/s 1244% 621% -- -86% mpu 1519656/s 9491% 5043% 613% --` [download] The only thing to note is the `%{+{ LIST }}` syntax, where `+` is used to disambiguate a hashref from a block (credit: perl IRC channel) because you can't dereference a map as a hash. PS: your hardware is ~3 times faster than mine ;) PPS: `"randperm" is not exported by the Math::Prime::Util module` in 0.60 so i upgraded to 0.73 L* There are no rules, there are no thumbs.. Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.	[reply] [d/l] [select]
Re^4: Pick k numbers at random -- hash keys by AnomalousMonk (Archbishop) on Nov 12, 2019 at 23:16 UTC
For small hashes, ordering isn't necessarily "random" from one hash to the next: `c:\@Work\Perl\monks>perl -wMstrict -le "for (1 .. 8) { my @k = keys %{ { map { $_ => 1 } 0 .. 8 } }; print qq{@k}; } " 6 3 7 2 8 1 4 0 5 6 3 7 2 8 1 4 0 5 6 3 7 2 8 1 4 0 5 6 3 7 2 8 1 4 0 5 6 3 7 2 8 1 4 0 5 6 3 7 2 8 1 4 0 5 6 3 7 2 8 1 4 0 5 6 3 7 2 8 1 4 0 5` [download] Same results under ActiveState 5.8.9 and Strawberry 5.14.4.1. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^5: Pick k numbers at random -- hash keys by syphilis (Archbishop) on Nov 13, 2019 at 01:06 UTC
For small hashes, ordering isn't necessarily "random" from one hash to the next I believe it's the version of perl that's important here. IIRC, with perl-5.16 and earlier the order was always repeated, but for later versions of perl the order varies at random (for some meaning of "at random"). For example, with Strawberry Perl 5.18.2.1 your one liner produces for me: `0 4 5 3 2 8 1 7 6 3 5 0 4 6 7 1 8 2 7 6 2 8 1 5 3 0 4 6 7 8 1 2 5 3 0 4 7 6 2 8 1 5 3 4 0 3 5 0 4 6 7 1 8 2 7 6 2 1 8 5 3 0 4 3 5 4 0 7 6 2 1 8` [download] and the output changes again when I re-run it. Interestingly, it's not strictly random - 0, 3, 4 and 5 always appear (in varying orders) either at the beginning or the end of the sequence. (I think it's still possible, via Configure args, to build perl with the pre-5.18 behaviour, though this is rarely done.) Cheers, Rob	[reply] [d/l]
Re^6: Pick k numbers at random -- hash keys (randomness) by Discipulus (Canon) on Nov 13, 2019 at 08:52 UTC
Re^6: Pick k numbers at random -- hash keys by choroba (Cardinal) on Nov 13, 2019 at 15:55 UTC
Re^3: Pick k numbers at random by ikegami (Patriarch) on Nov 14, 2019 at 18:19 UTC
Not surprising, since Math::Prime::Util's randperm is implemented in C I think it's more about the fact that `mpu` creates `$K` scalars, while the other two creates at least `$N`. That's why the performance of `shuf` approaches that of `mpu` as `$K` approaches `$N`.	[reply] [d/l] [select]