in reply to Re^7: RFC: Is there a solution to the flaw in my hash mechanism? (And are there any others?)
in thread RFC: Is there a solution to the flaw in my hash mechanism? (And are there any others?)

But as I said in an earlier thread, I'd use a b-tree for this ... so it's only a suggestion of something else to think about ;)

Despite my earlier somewhat flippant remark about my history with B-trees, I did give them some thought at that time. This is the logic I used to exclude them for this.

The main advantages B-trees have over binary trees are:

  1. Reduction in the number of pointers needed.

    But they still require quite a lot of pointers; which at 8-bytes per is a significant cost of memory.

  2. The "blocking" of similar values.

    With on-disk DBs, this leads to a significant saving in expensive reads from disk; but not applicable here.

    The increased locality of reference would play to the strengths of cpu caching; but it would require very careful design and tuning to get the most out of that. (Think Judy arrays complexity.)

But in the final analysis; a B-tree is O(logN) for lookups; and any advantages B-tree might have over other trees is only titivating at the edges; some reduction in memory requirement; some potential for reduced cache misses.

My (crude & flawed) test implementation of this hash has demonstrated O(1.021) average lookup probes for 175 million in a 200,000,033 array (87% fill); and both the insert & lookup code is trivial; and very fast.

With a b-tree, the average lookup for 175 million requires 8.25 probes; and the structure would require 3 times the space.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
  • Comment on Re^8: RFC: Is there a solution to the flaw in my hash mechanism? (And are there any others?)