in reply to Design thoughts: iterator invalidation

I'm beginning to suspect that there is no one right answer to it

That would be my thought too :-) I think which operations can be supported without invalidating the iterator depends very much on how the underlying data structure is implemented. Also the solution depends on your requirements: when do you want the iterators to be invalidated - do you really need every thread to always see the latest version of the data, or are some changes (you mention value updates) acceptable? Plus, there might be performance considerations (i.e. lock the whole hash vs. selective locking for selected operations)?

Anyway, I just thought I'd mention that I've found Java's java.util.concurrent classes fairly well thought out, just for example ConcurrentHashMap or CopyOnWriteArrayList, perhaps that could serve as a bit of inspiration...

Replies are listed 'Best First'.
Re^2: Design thoughts: iterator invalidation
by BrowserUk (Patriarch) on Feb 21, 2015 at 13:18 UTC
    I just thought I'd mention that I've found Java's java.util.concurrent classes fairly well thought out, just for example ConcurrentHashMap or CopyOnWriteArrayList, perhaps that could serve as a bit of inspiration...

    Trouble is, when it comes to iterators, the designers of ConcurrentHashMap seem to have abdicated their responsibility. The docs say:

    The view's iterator is a "weakly consistent" iterator that will never throw ConcurrentModificationException, and guarantees to traverse elements as they existed upon construction of the iterator, and may (but is not guaranteed to) reflect any modifications subsequent to construction.

    And I can find no mention at all of what happens if another thread resizes the thing when the iterator is half way through. They even proudly announce:"and there is not any support for locking the entire table in a way that prevents all access."

    I translate that as: this is a hard problem, so we didn't make any attempt to handle it in a clean way; nor to detect possible inconsistencies; nor even provide a way for you to detect them; nor even a way for you to prevent them.

    But oh, they did think to provide a method to look up keys from their values:containsValue(Object value) Returns true if this map maps one or more keys to the specified value. Note: This method requires a full internal traversal of the hash table, and so is much slower than method containsKey..

    A bit like a gun maker providing a tool for looking down the barrel to see if it is loaded; and attaching a note: Be careful!


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

      It's been quite a while since I last worked with it so this is mostly IIRC, but if I am understanding the documentation correctly, methods like keySet essentially return a snapshot of the key set at the time the iterator was created, so I would think that if the hash is resized by another thread the Iterator would iterate over the "old" key set, and only "may" reflect subsequent changes. And while not quite explicitly stated, because of this snapshot idea, I don't think the Iterators would ever return a key twice or omit a key that has not been modified since the creation of the Iterator. As for locking the entire structure, in Java that's pretty easy to accomplish by a synchronized block over some object (such as the hash table itself), and/or by using synchronizedMap from Collections, although of course locking the entire table is not particularly efficient if done often. I think that's some of the reasoning behind the java.util.concurrent classes: provide a data structure that can safely be used by multiple threads, while still providing decent performance - probably the docs for the package say it best:

      The "Concurrent" prefix used with some classes in this package is a shorthand indicating several differences from similar "synchronized" classes. For example java.util.Hashtable and Collections.synchronizedMap(new HashMap()) are synchronized. But ConcurrentHashMap is "concurrent". A concurrent collection is thread-safe, but not governed by a single exclusion lock. In the particular case of ConcurrentHashMap, it safely permits any number of concurrent reads as well as a tunable number of concurrent writes. "Synchronized" classes can be useful when you need to prevent all access to a collection via a single lock, at the expense of poorer scalability. In other cases in which multiple threads are expected to access a common collection, "concurrent" versions are normally preferable. And unsynchronized collections are preferable when either collections are unshared, or are accessible only when holding other locks.

      Most concurrent Collection implementations (including most Queues) also differ from the usual java.util conventions in that their Iterators provide weakly consistent rather than fast-fail traversal. A weakly consistent iterator is thread-safe, but does not necessarily freeze the collection while iterating, so it may (or may not) reflect any updates since the iterator was created.

      So I think it's a question of the requirements: If changes to the data set need to be seen immediately by all threads, some central lock is probably necessary, but if not, then an interface like this one might be a solution.

        methods like keySet essentially return a snapshot of the key set at the time the iterator was created, so I would think that if the hash is resized by another thread the Iterator would iterate over the "old" key set, and only "may" reflect subsequent changes.

        The salient point about the descriptions of methods like keySet is: "Returns a Set view of the keys contained in this map. The set is backed by the map, so changes to the map are reflected in the set, and vice-versa. "

        Basically, KeySet doesn't do anything at all. (ie. no snapshot.) It simply gives you access to an alternate set of methods through which you can access the underlying data-structure. And there are 3 ways it can be used to iterate that underlying structure:

        1. You can use the Set view's iterator() method to get a cursor that simply traverses the underlying hashmap's, underlying array and collision buckets, top to bottom/left to right.

          Any keys deleted (through the hashmap view) after the Set view was obtained, but before they have been reached by the cursor, will simply never be seen; despite that they existed when keySet was called.

          Any keys inserted through the hashmap view) after the Set view was obtained, but that happen to hash to a slot after the cursor, will be iterated despite that they did not exist when keyset was called.

          But the real problem is what happens when the underlying hashmap resizes. Now the cursor over the underlying array/buckets is useless because the resize will have randomly redistributed the order of the keys in the array. So, it not just possible that the cursor will revisit keys that have already been visited; it is highly likely.

          And the class(es) provide no mechanism for the caller to be notified nor detect that this has happened.

        2. You can take your own snapshot via toArray(), and then iterate that, passing the keys to contains() or remove().

          This is an unbacked copy of the keys that is stale as soon as it is taken. It will never see new additions and using keys that have been deleted will simply fail to do anything.

          Besides the cost, time and memory, involve in making a copy of all the keys; what does this 'snapshot' represent? In extremis, a millisecond after it is taken, another thread can call clear() on the hashmap, and this thread will then simply spin its wheels, iterating an array of keys that no longer exist.

        3. Hold a pre-existing list of keys and use them on the Set view.

          This suffers all the same problems as the previous way.

          In addition there'd be no point in obtaining the restricted functionality of the keySet(); when you could use the pre-existing list against the full functionality of the hashmap.

        IMO weakly consistent iterators are worse than full locking; they effectively provide no guarantees at all. The method might as well be called GiveMeArandomSetOfStringsSomeOfWhichMightExistInMyHash(); with the DUPLICATES_POSSIBLE flag permanently enabled.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked