in reply to Re^3: Design thoughts: iterator invalidation
in thread Design thoughts: iterator invalidation

methods like keySet essentially return a snapshot of the key set at the time the iterator was created, so I would think that if the hash is resized by another thread the Iterator would iterate over the "old" key set, and only "may" reflect subsequent changes.

The salient point about the descriptions of methods like keySet is: "Returns a Set view of the keys contained in this map. The set is backed by the map, so changes to the map are reflected in the set, and vice-versa. "

Basically, KeySet doesn't do anything at all. (ie. no snapshot.) It simply gives you access to an alternate set of methods through which you can access the underlying data-structure. And there are 3 ways it can be used to iterate that underlying structure:

  1. You can use the Set view's iterator() method to get a cursor that simply traverses the underlying hashmap's, underlying array and collision buckets, top to bottom/left to right.

    Any keys deleted (through the hashmap view) after the Set view was obtained, but before they have been reached by the cursor, will simply never be seen; despite that they existed when keySet was called.

    Any keys inserted through the hashmap view) after the Set view was obtained, but that happen to hash to a slot after the cursor, will be iterated despite that they did not exist when keyset was called.

    But the real problem is what happens when the underlying hashmap resizes. Now the cursor over the underlying array/buckets is useless because the resize will have randomly redistributed the order of the keys in the array. So, it not just possible that the cursor will revisit keys that have already been visited; it is highly likely.

    And the class(es) provide no mechanism for the caller to be notified nor detect that this has happened.

  2. You can take your own snapshot via toArray(), and then iterate that, passing the keys to contains() or remove().

    This is an unbacked copy of the keys that is stale as soon as it is taken. It will never see new additions and using keys that have been deleted will simply fail to do anything.

    Besides the cost, time and memory, involve in making a copy of all the keys; what does this 'snapshot' represent? In extremis, a millisecond after it is taken, another thread can call clear() on the hashmap, and this thread will then simply spin its wheels, iterating an array of keys that no longer exist.

  3. Hold a pre-existing list of keys and use them on the Set view.

    This suffers all the same problems as the previous way.

    In addition there'd be no point in obtaining the restricted functionality of the keySet(); when you could use the pre-existing list against the full functionality of the hashmap.

IMO weakly consistent iterators are worse than full locking; they effectively provide no guarantees at all. The method might as well be called GiveMeArandomSetOfStringsSomeOfWhichMightExistInMyHash(); with the DUPLICATES_POSSIBLE flag permanently enabled.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
  • Comment on Re^4: Design thoughts: iterator invalidation