There are some drawbacks about hashes. Accessing a key takes time, as we need to calculate the hash function. Furthermore, many elements could hash to the same address, which causes chains, and which slow down queries (and deletes). Furthermore, 'clearing' a hash takes time linear in the size of the stored set. (Some of the problems have been fixed in recent perls, but with some overhead when inserting elements). Implementation is also non-trivial.
If we know something about the set we are storing, we can sometimes use datastructures that are faster. For instance, suppose we know that all possible elements of the set are non-negative integers smaller than a fixed number U. We can then build an array (or a vec string) with U elements, say A, and we let i be a member of our set if and only if A [i] == 1. Otherwise, A [i] == 0. Using simple code, we have something like:
It's easy to see that inserting, deleting and querying all take O (1) time. Unfortunally initializing or clearing the structure takes time &Theta (U).use constant U => ....; sub init {@A = (0) x U} sub insert {$A [shift] = 1} sub delete {$A [shift] = 0} sub query {$A [shift]} sub clear {@A = (0) x U}
Now, a number n is in our query set, if and only if, all of the following conditions hold:A B +------+ +------+ U-1 | ? | | ? | +------+ +------+ . . . . . . . . . . . . . . . . . . . . +------+ +------+ 4 | 0 | | 3 | +------+ +------+ 3 | 4 | | 1 | <---- C (== 3) +------+ +------+ 2 | 1 | | 1 | +------+ +------+ 1 | 2 | | 2 | +------+ +------+ 0 | 2 | | 4 | +------+ +------+
Now, how do we insert? Suppose we want to insert a number i. If it's already in the set (that is, the conditions mentioned above are fullfilled), nothing needs to be done. Otherwise, we store i in B [C], we store C in A [i], and we increment C.
Deletion is a bit trickerier, but still straightforward. Suppose we want to delete i. We do that by moving B [C - 1] to B [A [i]], as follows: first B [A [i]] = B [C - 1], then A [B [C - 1]] = A [i]. And finally we decrement C.
In code:
Clearing the set is simple, all one needs to do is to set C to 0.int query (int i) {0 <= A [i] && A [i] < C && B [A [i]] == i} void insert (int i) {if (!query (i)) {B [C] = i; A [i] = C; C ++}} void delete (int i) {if (query (i)) { B [A [ i]] = B [C - 1]; A [B [ C - 1]] = A [i]; C --}}
So, we can do querying, inserting, deleting, and clearing in O (1) time. Not bad. But what about initializing? Well, that can be done in constant time as well. At least, in low level languages. The key is that we start off with C being 0, so no matter what is already in A or B, it never causes elements to be present. So, we do not need to initialize the arrays. As long as we can claim memory of appropriate size.
Hence, we can do all five operation in constant time - as long as the programming language doesn't insist on initializing.
Abigail
In reply to Data structure challenge by Abigail-II
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |