in reply to Advanced Data Structure Question

I seem to recall from my compsci classes some years ago :-) that you generally trade insertion speed for access speed. That said, it seems that your basic data structure maybe should be a "traditional" (not perl) hash. That is: buckets that hold multiple values. a hash function. You could then make the structure of the buckets be arrays, trees or dequeues depending on what your incoming data looks like and so forth. With the small size, arrays are likely to make good buckets. I say that because if you mke 1000 buckets, there will be few collisions. You should also store the size of each bucket in order to make the distance computations fast.

Why is "binary search" important for find? The hash, of course, does not provide that at the highest level, but could within the buckets. The hash is likely to be faster than the b-search.

HTH, --traveler

Replies are listed 'Best First'.
Re^2: Advanced Data Structure Question
by Limbic~Region (Chancellor) on Apr 19, 2006 at 14:51 UTC
    traveler,
    Given an ordered list, a binary search is the fastest way to find an element. I didn't realize there was a way to use a hashing function and preserve order as well. I thought, apparently incorrectly, that those two things were in conflict with one another. I guess I will look into hashing functions further if you feel they could do the job.

    Cheers - L~R

      Given any list, a hashed search is the fastest way to find an element because it's generally O(1) (unlike O(logn) for binary searches), particularly if you have a limit on the number of elements as you do. You can use 1001 as your modulus and (with a little finagling) guarantee uniqueness in your buckets.

      If you need to maintain order, do so by another data structure. In almost all cases, you can trade RAM for CPU.


      My criteria for good software:
      1. Does it work?
      2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
        Given any list if memory access speed is constant, a hashed search averages fastest.

        Real computers do not have constant memory access time. A tree structure that preserves locality of reference can therefore be faster if the application has "hotspots".

      There are ordered hash tables. Knuth did some work with them. You can also have a parallel array with the order information.