in reply to Re^3: better union of sets algorithm?
in thread better union of sets algorithm?
IMO what you are seeing there is NOT lookup time, but rather effects such as cache spoil and hash remapping. Every time the densitiy of the hash goes above a certain level the number of buckets in the hash are doubled and all of the elements are remapped. This has a cost that shows up in the type of benchamrk that you are doing. You would see most of this effect disappear by presizing the hash before loading it. The reduction in run time that is the consequence of cache spoil is basically impossible to eliminate. Basically once the hash gets over a certain size you are more or less guaranteed that a hash lookup will spoil cache, eliminating benefits that would be had were you able to store the whole hash inside of the cache.
Also its worth remembering that every time you use a unique key in _any_ hash _anywhere_ that key will be inserted into a second hash structure that is used to store the keys. This means that the second hash must undergo scaling as well. AFAIK, amortized the cost of hash storage or lookup is actually constant. BTW, I believe later perls incorporate a mechanism by which degenerate bucket mappings are avoided, from what I understand its unusual to get more than two elements to a bucket and more commonly you wont see more than one.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^5: better union of sets algorithm?
by Anonymous Monk on Mar 11, 2005 at 10:17 UTC | |
by demerphq (Chancellor) on Mar 11, 2005 at 10:30 UTC |