in reply to (jcwren) Re: Creating an array of unique numbers (Benchmarks!)
in thread Creating an array of unique numbers

You made me curious, so I tested the algorithm I proposed and came up with these results:
100 Lines, 30 Questions Benchmark: timing 5000 iterations of test_archon, test_jcwren, test_lu +cs, test_lucsjcwren, test_lucsjcwren2... test_archon: 6 wallclock secs ( 5.40 usr + 0.00 sys = 5.40 CPU) test_jcwren: 19 wallclock secs (17.80 usr + 0.02 sys = 17.82 CPU) test_lucs: 7 wallclock secs ( 6.30 usr + 0.00 sys = 6.30 CPU) test_lucsjcwren: 6 wallclock secs ( 5.84 usr + 0.00 sys = 5.84 CPU) test_lucsjcwren2: 6 wallclock secs ( 5.55 usr + 0.01 sys = 5.56 CPU +) 500 Lines, 150 Questions Benchmark: timing 5000 iterations of test_archon, test_jcwren, test_lu +cs, test_lucsjcwren, test_lucsjcwren2... test_archon: 28 wallclock secs (26.94 usr + 0.01 sys = 26.95 CPU) test_jcwren: 93 wallclock secs (88.83 usr + 0.11 sys = 88.94 CPU) test_lucs: 32 wallclock secs (30.66 usr + 0.00 sys = 30.66 CPU) test_lucsjcwren: 30 wallclock secs (28.46 usr + 0.02 sys = 28.48 CPU) test_lucsjcwren2: 29 wallclock secs (26.97 usr + 0.03 sys = 27.00 CPU +) 1000 Lines, 300 Questions Benchmark: timing 5000 iterations of test_archon, test_jcwren, test_lu +cs, test_lucsjcwren, test_lucsjcwren2... test_archon: 58 wallclock secs (54.92 usr + 0.05 sys = 54.97 CPU) test_jcwren: 189 wallclock secs (179.66 usr + 0.13 sys = 179.79 CPU) test_lucs: 66 wallclock secs (63.03 usr + 0.05 sys = 63.09 CPU) test_lucsjcwren: 61 wallclock secs (58.53 usr + 0.05 sys = 58.58 CPU) test_lucsjcwren2: 57 wallclock secs (54.80 usr + 0.05 sys = 54.85 CPU +)
test_archon:
sub test_archon { my %numhash; for (1 .. $Max_Questions) { my $randnum = int(rand($Max_Lines) + 1); redo if exists ($numhash{$randnum}); $numhash{$randnum} = 1; } return [keys (%numhash)]; }

Edit 2001-03-26 by tye to remove <pre>

Replies are listed 'Best First'.
(tye)Re2: Creating an array of unique numbers (Benchmarks!)
by tye (Sage) on Mar 28, 2001 at 02:38 UTC

    You need to try your algorythm with a large list when you want all (or even nearly all) of the question numbers! (:

            - tye (but my friends call me "Tye")
      I expected much worse results, but here are the results of grabbing 900 unique numbers from a list of 1000:
      Benchmark: timing 5000 iterations of test_archon, test_jcwren, test_lu +cs, test_lucsjcwren, test_lucsjcwren2... test_archon: 259 wallclock secs (242.45 usr + 0.14 sys = 242.59 CPU) test_jcwren: 213 wallclock secs (202.54 usr + 0.16 sys = 202.70 CPU) test_lucs: 285 wallclock secs (267.02 usr + 0.27 sys = 267.28 CPU) test_lucsjcwren: 268 wallclock secs (248.88 usr + 0.24 sys = 249.12 C +PU) test_lucsjcwren2: 255 wallclock secs (231.91 usr + 0.21 sys = 232.12 +CPU)
      This is where jcwren's algorithm begins to outperform lucs and mine. I'm surprised mine is still about as fast as lucs', though.
        If you are doing a fixed portion k of an array of size n, your algorithm is O(n). The constant gets worse the closer that k is to 1. Unless my back of the envelope calculation is wrong your constant is proportional to k-log(1-k) where that logarithm is the natural log. Throw in the fact that you are only selecting k*n elements that works out to your selecting things 1-(log(1-k)/k) times for every output (on average). If I put in k = 0.9 that works out to be a factor of about 3.56.

        So while if you are selecting all but a few elements, your approach is slow, if you are selecting up to a fixed proportion, it should be just fine.