in reply to Generating 0 .. N Randomly and Efficiently
Generate a randomly permuted M-length subset of the integers in the range 0..N.What makes this variant especially interesting is that the trivial solutions based upon shuffling 0..N are no longer viable because of inefficiency. (Consider the case when M=5 and N=2^48.)
In Jon Bentley's book More Programming Pearls: Confessions of a Coder (ISBN 0201118890), he discusses this problem at length and describes one of Knuth's solutions before arriving at the following, beautiful algorithm by the late Robert W Floyd:
The algorithm runs in time O(M) and requires storage of only M cells. (My implementation actually requires 2M cells because, for easy Data Dumping, I copy the resulting sequence into an array before returning it, but this step isn't required. BTW, I use the term "cell" to represent the approximate storage required by a hash entry or an array element.)# The following algorithm is due to Bob Floyd and is # described in Jon Bentley's, _More Programming Pearls_ # (1988), on p. 143. # # In this Perl implementation, I use a hash as an efficient, # randomly-addressable series that supports insertion in # O(1) time. This implementation selects subsets of size $m # from the range [0,$n], without replacement. (The original # from Bentley's book used the range [1,n]). sub randomly_permuted_subset_of_range { my ($m, $n) = @_; my (@result, %series, $head); for (my $j = $n-$m+1; $j <= $n; $j++) { my $t = int rand($j+1); if (exists $series{$t}) { # insert $j into series after $t @series{$t,$j} = ($j, $series{$t}); } else { # prefix $t to series ($head, $series{$t}) = ($t, $head); } } for ($n = $head; defined $n; $n = $series{$n}) { push @result, $n; } return \@result; }
Some examples:
And the corresponding output:use Data::Dumper; $Data::Dumper::Indent = 0; print Dumper( randomly_permuted_subset_of_range(5, 10) ); print Dumper( randomly_permuted_subset_of_range(11, 10) ); print Dumper( randomly_permuted_subset_of_range(5, 2**48) );
Note that when M=N+1, Floyd's algorithm solves your original problem for any given N.$VAR1 = [2,1,7,0,5]; $VAR1 = [7,8,1,5,3,2,4,10,0,9,6]; $VAR1 = ['211572034652645','198222342410498', '42862092043890','253205057797894', '185109518737615'];
Cheers,
Tom
Tom Moertel : Blog / Talks / CPAN / LectroTest / PXSL / Coffee / Movie Rating Decoder
|
|---|