RCU is optimized for the read case and so writes can be very slow.
Writes can't be very slow. Garbage collection of prior copies can be significantly delayed.
The closest thing to writes being slow is that you have to create an entire new unit so having huge units leads to having to copy the full unit in order to make an update. If that makes your writes slow, then you probably need to split that huge unit up.
| [reply] |
| [reply] |
I reached the (my) conclusion that RCU doesn't really help with my implementation of a concurrent hash.
My design is based upon the ideas presented in A lock-free hash table. (Note. the premise is that the implementation does not require internal locks --though there will be some internal spin-waits.)
That doesn't mean that the user will not have the ability to lock both individual key/value pairs for those situations where the new value of a key is derived from the old value. I'm also reaching the conclusion that I will have to provide the ability to lock the entire hash.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
| [reply] |
Nice talk. The blog he started just before the video was hosted at his employer who changed the URL to http://www.azulsystems.com/blog/cliff, which also links to his new blog, http://www.cliffc.org/blog/. I also found a link to one version of the code.
The design actually comes quite close to RCU in many ways. He uses atomic Check-And-Set as a form of optimistic locking for updaters. (And there is significant work beyond just RCU in order to make that all work.) This means updaters don't have to wait for a lock but they may have to retry their updates (but the number of retries is bounded by the number of threads and, in practice, is almost always very small).
Sounds like it never got folded in to java.util, though. I'm a bit curious why but haven't yet spent the time to try to find any discussions on that.
The discussion about how the size of the hash is tracked indicates that having a single generation number could be a bottleneck when having a huge number of simultaneous updates. He tracks the number of entries in the hash by having a striped counter where each thread will only update one (partial) count and to get the count you have to add up all of the partial counts. So the multiple (cyclic) generation counts in a single 64-bit word that you mentioned would likely be a bottleneck for huge numbers of threads. There are probably tons of use cases where that ends up mattering very little, though.
| [reply] |
Nice talk.
Its actually been a while since I last watched it through. Probably the last time was when I last mentioned it here; and the first time a year or so before that.
I do remember that he goes through it at a lick -- thank dog for pause and rewind -- but for me that is preferable to those that labour each point.
What I consider the most significant point in the talk was that he reinforced my long held stance that the thing that screws up most concurrent programming is that people are always trying to lock step data between threads and use far too much locking. Far more than is necessary most of the time.
He almost skips over -- certainly deemphasizes -- assuming correct atomicity of writes to shared data, when one thread reads a shared value -- it is "the latest" available; it might change a microsecond after, or might have changed a microsecond before; but when you read it, it is the latest value. So do what you need to with it and then try and put it back. If its changed in the meantime, do it again.
I agree that is very similar to RCU; and for individual updates to values; it works fine.
The reason I've rejected it for the iterator problem is that it isn't individual values changing that is the big danger; it is the state of the entire structure changing -- ie. resizes. In order to use RCU, the iterator would have to take a complete copy of the hash in order to be able to provide the guarantees you need from an iterator. Ie. That all ((still) existing) keys will be visited once; and only once.
In Cliff Click's scheme, he creates the resized hash and keeps the old one around; and values propagate from the old to the new piece meal as they are accessed. But he only makes the same guarantees for iterators as the original ConcurrentHashMap; which are basically very few and very weak.
the multiple (cyclic) generation counts in a single 64-bit word that you mentioned would likely be a bottleneck for huge numbers of threads.
I'm hoping that will not be the case. The reason for wanting to restrict the overall version to a 64-bit value is because that is the largest size for which I can guarantee to be able to deal with using an atomic instructions:
__int64 _InterlockedCompareExchange64(__int64 volatile *,__int64,__int
+64)
.
There is also an atomic instruction for incrementing 16-bit fields: short _InterlockedIncrement16(short volatile *Addend)
which I believe will mean that my version counting and comparing should be both lock and wait free thus should not become a bottleneck.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
| [reply] [d/l] [select] |
RCU does it that way, so reading up on how it works may give you some ideas.
Ignore this. I started to respond negatively to this suggestion then realised that I didn't yet fully understand the implications of the thing and how it might be adapted to my situation.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
| [reply] |