in reply to Creating an array of unique numbers
I decided to Benchmark the original routine I presented, along with lucs code, and a modification to it. Using the shuffle method is clearly the least efficient, under every test case. lucs method is really slick, and I was able to boost the speed with a little enhancing. I wasn't up to trying to code up tye's method. Below are the benchmarks for the various methods.
The speed difference is consistently 2.9 times from fastest to slowest. I don't know how the shuffle() routine works, but it doesn't seem to be affected by the size of the data set. Since I do know how lucs method works, I didn't expect it to be non-linear (and it's not).
An interesting note in the 'test_jcwren' vs. 'test_jcwren2' tests. The only difference is assigning 'undef' instead of 0 to the hash element. Since exists() knows that 'undef' is a legal value for a hash value, we can shave a significant amount of time off of 100,000 interations. In fact, simply assigning 'undef' unstead of 0 account for 8 CPU seconds. I guess because the interpeter doesn't have to create a scalar and do the conversion, it's a little faster.
I've also left some code in their if you want to check the output and distributions of the random numbers. I hacked tilly's histogram code a previous node. All 4 implementations appear to have the same distribution, and produce values across the requires span.
100 Lines, 30 Questions Benchmark: timing 100000 iterations of test_jcwren, test_lucs, test_lu +csjcwren, test_lucsjcwren2... test_jcwren: 73 wallclock secs (43.26 usr + 0.08 sys = 43.34 CPU +) test_lucs: 30 wallclock secs (17.27 usr + 0.01 sys = 17.28 CPU +) test_lucsjcwren: 28 wallclock secs (16.32 usr + 0.00 sys = 16.32 CPU +) test_lucsjcwren2: 26 wallclock secs (15.61 usr + 0.01 sys = 15.62 CPU +) 500 Lines, 150 Questions Benchmark: timing 100000 iterations of test_jcwren, test_lucs, test_lu +csjcwren, test_lucsjcwren2... test_jcwren: 379 wallclock secs (217.85 usr + 0.15 sys = 218.00 +CPU) test_lucs: 145 wallclock secs ( 82.29 usr + 0.19 sys = 82.48 +CPU) test_lucsjcwren: 137 wallclock secs ( 79.63 usr + 0.14 sys = 79.77 +CPU) test_lucsjcwren2: 128 wallclock secs ( 75.22 usr + 0.02 sys = 75.24 +CPU) 1000 Lines, 300 Questions Benchmark: timing 100000 iterations of test_jcwren, test_lucs, test_lu +csjcwren, test_lucsjcwren2... test_jcwren: 754 wallclock secs (438.51 usr + 0.50 sys = 439.01 +CPU) test_lucs: 291 wallclock secs (168.96 usr + 0.12 sys = 169.08 +CPU) test_lucsjcwren: 277 wallclock secs (162.44 usr + 0.04 sys = 162.48 +CPU) test_lucsjcwren2: 267 wallclock secs (154.43 usr + 0.11 sys = 154.54 +CPU)
--Chris#!/usr/local/bin/perl -w use strict; use Algorithm::Numerical::Shuffle qw(shuffle); use List::Util qw(min max sum); use POSIX qw(ceil floor); use Benchmark; my $Max_Questions = 0; my $Max_Lines = 0; { my $results; my %hash; for (([100,30]), ([500,150]), ([1000,300])) { $Max_Lines = @$_[0]; $Max_Questions = @$_[1]; printf ("%d Lines, %d Questions\n", $Max_Lines, $Max_Questions); timethese (100000, { 'test_jcwren' => sub { test_jcwren () + }, 'test_lucs' => sub { test_lucs () + }, 'test_lucsjcwren' => sub { test_lucsjcwren +() }, 'test_lucsjcwren2' => sub { test_lucsjcwren2 + () }, } ); print "\n"; } =cut # # Be sure to set $Max_Lines and $Max_Questions at the top # for (0..1000) { $results = test_lucsjcwren (); $hash {$_}++ foreach (@$results); } show_array ($results); show_histogram (1, \%hash); =cut } # # Test cases # sub test_jcwren { return \@{[(shuffle (0..$Max_Lines-1)) [0..$Max_Questions-1]]}; } sub test_lucs { my $m = $Max_Questions; my $j = $Max_Lines - $m + 1; my %sample; while ($m-- > 0) { my $val = int ($j * rand (1)); $sample{ exists $sample{$val} ? $j - 1 : $val } = 0; ++$j; } return (\@{[keys %sample]}); } sub test_lucsjcwren { my $m = $Max_Questions; my $j = $Max_Lines - $m + 1; my %sample; while ($m--) { my $val = int (rand ($j++)); $sample {exists $sample{$val} ? $j - 2 : $val} = 0; } return (\@{[keys %sample]}); } sub test_lucsjcwren2 { my $m = $Max_Questions; my $j = $Max_Lines - $m + 1; my %sample; while ($m--) { my $val = int (rand ($j++)); $sample {exists $sample{$val} ? $j - 2 : $val} = undef; } return (\@{[keys %sample]}); } # # Utilities # sub show_array { my $array = shift; my @sorted = sort {$a <=> $b} @$array; print"Element\t\tUnsorted\tSorted\n"; print"-------\t\t--------\t------\n"; for (my $z = 0; $z < $Max_Questions; $z++) { print" $z"; print"\t\t @$array[$z]\t\t"; print" $sorted[$z]\n"; } } sub show_histogram { my $bin_size = shift; my $articles = shift; my $width = 50; my $max_count = max (values %$articles); my $scale = ceil($max_count / $width); print " Index Count\n"; print "------------- -------", "-" x 50, "\n"; my @bins = sort {$a <=> $b} keys %$articles; foreach my $bin (min(@bins)..max(@bins)) { my $count = $articles->{$bin} || 0; my $extra = ($count % $scale) ? '.' : ''; my $start = $bin * $bin_size; my $end = $start + $bin_size - 1; printf "%4d .. %4d \[%4d\] %s$extra\n", $start, $end, $count, '#' x floor ($count / $scale); } print "\n Scale: #=$scale\n\n" if $scale > 1; }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: (jcwren) Re: Creating an array of unique numbers (Benchmarks!)
by archon (Monk) on Mar 28, 2001 at 02:14 UTC | |
by tye (Sage) on Mar 28, 2001 at 02:38 UTC | |
by archon (Monk) on Mar 28, 2001 at 03:38 UTC | |
by tilly (Archbishop) on Mar 28, 2001 at 09:30 UTC |