comment on

You make one pass through the array. At each step you will make one swap. The idea is that at the first step you choose what element belongs at the end of the array. At the second step you choose the next to last element of the array out of what remained. And so on until you get to choosing the first element out of..gee..one element. :-)

Now how many times will you swap an element for itself? On average the last element gets swapped 1/n times, the next one 1/(n-1), etc down to 1/1. That is:

1 + 1/2 + 1/3 + ... + 1/n

Which is, after suitable rearranging:

1/2 + 1/2n +
     (1 + 1/2)/2 +
     (1/2 + 1/3)/2 +
     (1/3 + 1/4)/2 +
     .  .  .       +
     (1/(n-1) + 1/n)/2

Aside from the pieces at the ends, this turns out to be a sum of local trapezoidal approximations to the integral from 1 to n of 1/x. The error terms turn out to converge to a constant, and the integral that you are approximating is log(n)-log(1)=log(n). (Natural logs, of course, are the only kind that most mathematicians care about.)

Therefore on average the number of swaps you save is log(n) plus a complicated constant plus some small terms.

Now this is on average. What will happen realistically? Well instead of adding up numbers we are adding up random variables. The calculation this time makes the previous one I sketched seem trivial. (Note, the variables are not independent.) But glory halleluja! The variance of the sum for any number of terms turns out to be bounded by a constant (and not a big one either) so in fact you can put absolute upper bound on the likelyhood it varies by more than, say, 10 from that average. So you can say that within (eg) a 99% confidence interval it will lie within some fixed distance of log(n), and that estimate will be true for n=10, n=1000, n=1,000,000, and n=10^100.

Now how does this fit in big-O notation? Well first of all big-O notation as defined by Knuth doesn't really fit for random algorithms. Darn. (Easy enough to modify though.) Secondly people don't really use the word the same way that Knuth did. (eg Hash lookups as implemented in Perl are O(n). Everyone calls them O(1). Conversely O(n) algorithms are really O(n*n) as well, but we say, "That is O(n*n) so it is not a good algorithm.") Double darn. But hey, that is OK. After all big-O, little-o were invented by Bachmann in 1894 for number theory, not algorithms. (And were later popularized by Landau.) Language does change.

Besides which, your test (averaging repeated runs) is going to test average behaviour, and not the outliers.

Therefore whether or not you argue about the use of the phrase, "O(1)", my underlying comment is an accurate statement about what you are trying to measure. The benefit of the branch will be seen, on average, log(n) times plus some constant on an array of size n. Putting the logic in will cost you n times.

An incidental note. Perl compiles everything down to portable op codes, not assembler, and runs those codes through an interpreter. Therefore thinking about how a C compiler would solve a problem is missing the point. perl simply does not deal with the physical machine at the same level that, say, gcc does. That if check is going to be dealt with as an interpreted statement.

BTW take a look at my home node. I may have ranted a bit here, but that is because you wandered too close to what I am really good at. Namely math. :-)

EDIT
Good and out of practice. :-(

When I put it on paper the random variables actually were independent with variances of (n-1)/(n*n), so the variance of the sum is again log(n)+O(1), and the standard deviation O(sqrt(log(n))). Therefore on a run you save:

log(n) + O(sqrt(log(n)))

iterations...

In reply to RE (tilly) 3: Fisher-Yates Shuffle by tilly
in thread Fisher-Yates Shuffle by Adam

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.