Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: When the Best Solution Isn't

by BrowserUk (Patriarch)
on Sep 23, 2002 at 13:14 UTC ( [id://200083]=note: print w/replies, xml ) Need Help??


in reply to When the Best Solution Isn't

That weird shuffle sort (or desort might be a better term) does amazingly well on speed, and surprisingly, under at least some circumstances can actually also produce all possible combinations. Analyzing the distribution shows it real weakness. The story being told by the standard deviation figures below.

thraxil and aristotles corrected forms do much here, but at the expense of the loss of speed. I didn't see japhy's version till my tests were complete so I haven't included it.

However, a well crafted Fischer-Yates shuffles out does both of the "random sort" solutions for sheer performance by a good margin, being twice as fast with with arrays of 1000 elements and nearly 5 times as fast once you get to 100,000 elements.

It also seems to do the best on the distribution test, but that maybe simply down to the random number generator rather then the algorithm. That's too deep statistical voodoo for me to determine.

Update: Forgot to mention, the Fischer-Yates is an in-place sort so minimum memory usage too.

#!/usr/bin/perl -w use strict; use Benchmark qw(cmpthese); use Data::Dumper; sub xform { map {$_->[0]} sort { $a->[1] <=> $b->[1]} map {[$_, rand(1)]} @_; } sub slice { my @random; push @random, rand 1 for 0 .. $#_; @_[ sort { $random[$a] <=> $random[$b] } 0 .. $#_ ]; } sub shufl { $a = $_ + rand @_ - $_ and @_[$_, $a] = @_[$a, $_] for (0..$#_); return @_; } sub qshuf { sort { .5 <=> rand(1) } @_; } my @array = 1 .. 1000; cmpthese(10, { slice => sub { slice @array }, xform => sub { xform @array }, shufl => sub { shufl @array }, qshuf => sub { qshuf @array }, }); my (%buckets, %d, @temp);; my @set = qw(A B C D); for (1 .. 100_000 ) { $buckets{"@{[slice @temp=@set]}"}{slice}++; $buckets{"@{[xform @temp=@set]}"}{xform}++; $buckets{"@{[shufl @temp=@set]}"}{shufl}++; $buckets{"@{[qshuf @temp=@set]}"}{qshuf}++; } print "\npermutation | slice | xform | shufl | qshuf \n"; print "--------------------------------------------------\n"; for my $key (sort keys %buckets) { printf "%8.8s: | %4d | %4d | %4d | %4d\n", $key, $buckets{$key}{slice}, $buckets{$key}{xform}, $buckets{$key}{shufl}, $buckets{$key}{qshuf}; $d{slice}{Ex} += $buckets{$key}{slice}; $d{slice}{Ex2} += $buckets +{$key}{slice}**2; $d{xform}{Ex} += $buckets{$key}{xform}; $d{xform}{Ex2} += $buckets +{$key}{xform}**2; $d{shufl}{Ex} += $buckets{$key}{shufl}; $d{shufl}{Ex2} += $buckets +{$key}{shufl}**2; $d{qshuf}{Ex} += $buckets{$key}{qshuf}; $d{qshuf}{Ex2} += $buckets +{$key}{qshuf}**2; } print "---------------------------------------------------\n"; printf "Std. Dev. | %0.3f | %0.3f | %0.3f | %0.3f\n", sqrt( ($d{slice}{Ex2} - ($d{slice}{Ex}**2/24))/23 ), sqrt( ($d{xform}{Ex2} - ($d{xform}{Ex}**2/24))/23 ), sqrt( ($d{shufl}{Ex2} - ($d{shufl}{Ex}**2/24))/23 ), sqrt( ($d{qshuf}{Ex2} - ($d{qshuf}{Ex}**2/24))/23 ); __END__ C:\test>199981 Benchmark: timing 10000 iterations of qshuf, shufl, slice, xform... qshuf: 3 wallclock secs ( 3.04 usr + 0.01 sys = 3.04 CPU) @ 32 +84.07/s (n=10000) shufl: 217 wallclock secs (209.08 usr + 0.01 sys = 209.09 CPU) @ + 47.83/s (n=10000) slice: 435 wallclock secs (429.57 usr + 0.01 sys = 429.58 CPU) @ + 23.28/s (n=10000) xform: 716 wallclock secs (693.68 usr + 0.00 sys = 693.68 CPU) @ + 14.42/s (n=10000) Rate xform slice shufl qshuf xform 14.4/s -- -38% -70% -100% slice 23.3/s 61% -- -51% -99% shufl 47.8/s 232% 105% -- -99% qshuf 3284/s 22681% 14008% 6767% -- permutation | slice | xform | shufl | qshuf -------------------------------------------------- A B C D: | 4322 | 4277 | 4127 | 12320 A B D C: | 4127 | 4115 | 4143 | 6134 A C B D: | 4284 | 4185 | 4156 | 6430 A C D B: | 4246 | 4083 | 4272 | 3094 A D B C: | 4205 | 4192 | 4062 | 3167 A D C B: | 4182 | 4128 | 4125 | 1597 B A C D: | 4143 | 4287 | 4246 | 12478 B A D C: | 4146 | 4156 | 4154 | 6273 B C A D: | 4027 | 4133 | 4133 | 6354 B C D A: | 4171 | 4153 | 4163 | 3092 B D A C: | 4191 | 4128 | 4201 | 3170 B D C A: | 4187 | 4233 | 4143 | 1546 C A B D: | 4088 | 4163 | 4170 | 6217 C A D B: | 4044 | 4197 | 4127 | 3190 C B A D: | 4214 | 4228 | 4114 | 6261 C B D A: | 4169 | 4021 | 4260 | 3080 C D A B: | 4069 | 4075 | 4185 | 1480 C D B A: | 4120 | 4102 | 4185 | 1533 D A B C: | 4177 | 4151 | 4199 | 3037 D A C B: | 4248 | 4207 | 4198 | 1608 D B A C: | 4175 | 4252 | 4203 | 3087 D B C A: | 4135 | 4173 | 4198 | 1641 D C A B: | 4203 | 4157 | 4098 | 1620 D C B A: | 4127 | 4204 | 4138 | 1591 --------------------------------------------------- Std. Dev. | 70.671 | 64.372 | 50.640 | 3127.055

Cor! Like yer ring! ... HALO dammit! ... 'Ave it yer way! Hal-lo, Mister la-de-da. ... Like yer ring!

Replies are listed 'Best First'.
Re^2: When the Best Solution Isn't
by Anonymous Monk on Apr 01, 2005 at 09:29 UTC
    sub qshuf { sort { .5 <=> rand(1) } @_; } cmpthese(10, { slice => sub { slice @array }, xform => sub { xform @array }, shufl => sub { shufl @array }, qshuf => sub { qshuf @array }, }); Rate xform slice shufl qshuf xform 14.4/s -- -38% -70% -100% slice 23.3/s 61% -- -51% -99% shufl 47.8/s 232% 105% -- -99% qshuf 3284/s 22681% 14008% 6767% --
    Mindboggling. "qshuf", an N log N algorithm, shows to be 67 faster than any of linear algorithms, and you don't raise a flag?

    Your benchmark is extremely flawed. You're running your sort in void context. Perl, being smart, knows that sort in void context isn't going to do anything useful, so it's not going to work up a sweat, and just won't do anything.

    Benchmark: timing 10000 iterations of qshuf, shufl, slice, xform
    Hmmm, with a first argument of '10' to 'cmpthese'? Either your output doesn't match your program, or your Benchmark.pm module is different than mine.
      Mindboggling. "qshuf", an N log N algorithm, ...

      Hmm. N log N with N=4 == 2.408. How does that affect your boogle? The main part of the post was the statistical analysis, but you are correct, the void context does make a difference to the timing and the output posted was obviously an accumulation from 2 different runs of the program which had been modified in the interim.

      Hmmm, with a first argument of '10' to 'cmpthese'? Either your output doesn't match your program

      It was a long time ago, so I am guessing more than remembering, but I think what probably happened was that I wrote the benchmark code first, ran it (with the iters counter set to 10000) and c&p'd the output to the source file.

      Seeing that qshuf() seemed very fast, (I probably wasn't even aware that sort silently optimised for void context rather than doing something sensible like issuing the "Useless in a void context" warning as other parts of perl do). I then went on to add the statistical analysis, and dropped the iterations counter to 10 to save time while getting the formatting of the output right.

      Here is a an updated version (run on 5.6.1 as the original was):

      And by way of recompense, here's an updated version incorporating List::Util::shuffle run on 5.8

      Thanks for pointing out the discrepancies--even if it did take 2 years for collected populace to notice :)


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco.
      Rule 1 has a caveat! -- Who broke the cabal?
        Hmm. N log N with N=4 == 2.408. How does that affect your boogle?
        Not at all. Three points:
        1. In CS, when people talk about log, they mean 2log, and 4 2log 4 == 8.
        2. I don't know what significance 4 has. The array has a 1000 elements, so N equals 1000, so N log N would be about 9966.
        3. But numbers don't mean anything. When we talk about a linear, or a quadratic, or a N log N, or some other function of N algorithm, we talk about how it scales. Any constants involved are ignored.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://200083]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (3)
As of 2024-04-24 19:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found