Re^4: mathematical proof

As soon as someone talks about hash inserts being O(1) I assume that they are talking about the average case performance when your hash algorithm is working. I'm sorry I didn't make that explicit. If you wish to qualify everything I said about hashes with "in the average case", do so because that is correct. But if you wish to mix an analysis of the average case with comments about a worst case type of scenario, that's wrong.

Incidentally if you look at the worst case performance of a hash that uses linked lists for the buckets, then the hash access is O(n) which means that building a hash with n buckets cannot be better than O(n*n). Which clearly loses to the array solution. Conditional logic to try and reduce the likelyhood of worst case performance cannot improve this fact unless you replace the linked list in the bucket with something else (like a btree).

Changing the structure of the buckets will make the average case hash access O(1) and worst case O(log(n)) but complicates the code and makes the constant for a hash access worse. People tend to notice the average case performance, and so do not make that change. Real example: when abuse of the hashing function showed up as a security threat in Perl, multiple options were considered. In the end rather than making the worst case performance better, it was decided to randomize the hashing algorithm so that attackers cannot predict what data sets will cause performance problems.

Comment on Re^4: mathematical proof

Replies are listed 'Best First'.
Re^5: mathematical proof by JavaFan (Canon) on Feb 03, 2009 at 15:01 UTC
As soon as someone talks about hash inserts being O(1) I assume that they are talking about the average case performance when your hash algorithm is working. So do I (well, I try to avoid the term 'average' - in this case, I'd use 'expected'. 'amortized' is another term a layman may call 'average'). But I stop assuming that as soon as 'worst case' is mentioned. Or 'mathematical proof'.	[reply]
Re^6: mathematical proof by tilly (Archbishop) on Feb 03, 2009 at 16:40 UTC
Why would you stop making that assumption when "mathematical proof" is mentioned? We can mathematically prove what happens in the average case as easily as we can analyze the worst case, and frequently do so. Furthermore, as I already demonstrated, programmers usually care more about the average case than the worst case. And yes, average can mean several different things. I was slightly sloppy about that, but not so sloppy that I think it would cause any real confusion. However I stay away from "expected" with a lay audience because I worry that laypeople are likely to misunderstand "expected" as "median". Instead I'd lean towards "amortized".	[reply]
Re^6: mathematical proof by ikegami (Patriarch) on Feb 03, 2009 at 17:10 UTC
Sorry, but "average case" is well recognized and accepted terminology.	[reply]