Re: (jcwren) Re: Creating an array of unique numbers (Benchmarks!)

You made me curious, so I tested the algorithm I proposed and came up with these results:

100 Lines, 30 Questions
Benchmark: timing 5000 iterations of test_archon, test_jcwren, test_lu
+cs, test_lucsjcwren, test_lucsjcwren2...
test_archon:  6 wallclock secs ( 5.40 usr +  0.00 sys =  5.40 CPU)
test_jcwren: 19 wallclock secs (17.80 usr +  0.02 sys = 17.82 CPU)
 test_lucs:  7 wallclock secs ( 6.30 usr +  0.00 sys =  6.30 CPU)
test_lucsjcwren:  6 wallclock secs ( 5.84 usr +  0.00 sys =  5.84 CPU)
test_lucsjcwren2:  6 wallclock secs ( 5.55 usr +  0.01 sys =  5.56 CPU
+)

500 Lines, 150 Questions
Benchmark: timing 5000 iterations of test_archon, test_jcwren, test_lu
+cs, test_lucsjcwren, test_lucsjcwren2...
test_archon: 28 wallclock secs (26.94 usr +  0.01 sys = 26.95 CPU)
test_jcwren: 93 wallclock secs (88.83 usr +  0.11 sys = 88.94 CPU)
 test_lucs: 32 wallclock secs (30.66 usr +  0.00 sys = 30.66 CPU)
test_lucsjcwren: 30 wallclock secs (28.46 usr +  0.02 sys = 28.48 CPU)
test_lucsjcwren2: 29 wallclock secs (26.97 usr +  0.03 sys = 27.00 CPU
+)

1000 Lines, 300 Questions
Benchmark: timing 5000 iterations of test_archon, test_jcwren, test_lu
+cs, test_lucsjcwren, test_lucsjcwren2...
test_archon: 58 wallclock secs (54.92 usr +  0.05 sys = 54.97 CPU)
test_jcwren: 189 wallclock secs (179.66 usr +  0.13 sys = 179.79 CPU)
 test_lucs: 66 wallclock secs (63.03 usr +  0.05 sys = 63.09 CPU)
test_lucsjcwren: 61 wallclock secs (58.53 usr +  0.05 sys = 58.58 CPU)
test_lucsjcwren2: 57 wallclock secs (54.80 usr +  0.05 sys = 54.85 CPU
+)
[download]

test_archon:

sub test_archon
{
  my %numhash;
  for (1 .. $Max_Questions) {
    my $randnum = int(rand($Max_Lines) + 1);
    redo if exists ($numhash{$randnum});
    $numhash{$randnum} = 1;
  }
  return [keys (%numhash)];
}
[download]

Edit 2001-03-26 by tye to remove <pre>

Comment on Re: (jcwren) Re: Creating an array of unique numbers (Benchmarks!) Select or Download Code

Replies are listed 'Best First'.
(tye)Re2: Creating an array of unique numbers (Benchmarks!) by tye (Sage) on Mar 28, 2001 at 02:38 UTC
You need to try your algorythm with a large list when you want all (or even nearly all) of the question numbers! (: - tye (but my friends call me "Tye")	[reply]
Re: (tye)Re2: Creating an array of unique numbers (Benchmarks!) by archon (Monk) on Mar 28, 2001 at 03:38 UTC
I expected much worse results, but here are the results of grabbing 900 unique numbers from a list of 1000: `Benchmark: timing 5000 iterations of test_archon, test_jcwren, test_lu +cs, test_lucsjcwren, test_lucsjcwren2... test_archon: 259 wallclock secs (242.45 usr + 0.14 sys = 242.59 CPU) test_jcwren: 213 wallclock secs (202.54 usr + 0.16 sys = 202.70 CPU) test_lucs: 285 wallclock secs (267.02 usr + 0.27 sys = 267.28 CPU) test_lucsjcwren: 268 wallclock secs (248.88 usr + 0.24 sys = 249.12 C +PU) test_lucsjcwren2: 255 wallclock secs (231.91 usr + 0.21 sys = 232.12 +CPU)` [download] This is where jcwren's algorithm begins to outperform lucs and mine. I'm surprised mine is still about as fast as lucs', though.	[reply] [d/l]
Re (tilly) 5: Creating an array of unique numbers (Benchmarks!) by tilly (Archbishop) on Mar 28, 2001 at 09:30 UTC
If you are doing a fixed portion k of an array of size n, your algorithm is O(n). The constant gets worse the closer that k is to 1. Unless my back of the envelope calculation is wrong your constant is proportional to `k-log(1-k)` where that logarithm is the natural log. Throw in the fact that you are only selecting k*n elements that works out to your selecting things `1-(log(1-k)/k)` times for every output (on average). If I put in k = 0.9 that works out to be a factor of about 3.56. So while if you are selecting all but a few elements, your approach is slow, if you are selecting up to a fixed proportion, it should be just fine.	[reply]