You may wish to consider the following variant of your problem:
Generate a randomly permuted M-length subset of the integers in the range 0..N.
What makes this variant especially interesting is that the trivial solutions based upon shuffling 0..N are no longer viable because of inefficiency. (Consider the case when M=5 and N=2^48.)

In Jon Bentley's book More Programming Pearls: Confessions of a Coder (ISBN 0201118890), he discusses this problem at length and describes one of Knuth's solutions before arriving at the following, beautiful algorithm by the late Robert W Floyd:

# The following algorithm is due to Bob Floyd and is # described in Jon Bentley's, _More Programming Pearls_ # (1988), on p. 143. # # In this Perl implementation, I use a hash as an efficient, # randomly-addressable series that supports insertion in # O(1) time. This implementation selects subsets of size $m # from the range [0,$n], without replacement. (The original # from Bentley's book used the range [1,n]). sub randomly_permuted_subset_of_range { my ($m, $n) = @_; my (@result, %series, $head); for (my $j = $n-$m+1; $j <= $n; $j++) { my $t = int rand($j+1); if (exists $series{$t}) { # insert $j into series after $t @series{$t,$j} = ($j, $series{$t}); } else { # prefix $t to series ($head, $series{$t}) = ($t, $head); } } for ($n = $head; defined $n; $n = $series{$n}) { push @result, $n; } return \@result; }
The algorithm runs in time O(M) and requires storage of only M cells. (My implementation actually requires 2M cells because, for easy Data Dumping, I copy the resulting sequence into an array before returning it, but this step isn't required. BTW, I use the term "cell" to represent the approximate storage required by a hash entry or an array element.)

Some examples:

use Data::Dumper; $Data::Dumper::Indent = 0; print Dumper( randomly_permuted_subset_of_range(5, 10) ); print Dumper( randomly_permuted_subset_of_range(11, 10) ); print Dumper( randomly_permuted_subset_of_range(5, 2**48) );
And the corresponding output:
$VAR1 = [2,1,7,0,5]; $VAR1 = [7,8,1,5,3,2,4,10,0,9,6]; $VAR1 = ['211572034652645','198222342410498', '42862092043890','253205057797894', '185109518737615'];
Note that when M=N+1, Floyd's algorithm solves your original problem for any given N.

Cheers,
Tom


In reply to Re: Generating 0 .. N Randomly and Efficiently by tmoertel
in thread Generating 0 .. N Randomly and Efficiently by Limbic~Region

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.