in reply to Re: OT: Locking and syncing mechanisms available on *nix.
in thread OT: Locking and syncing mechanisms available on *nix.

and why mutexes are also necessary.

Sorry, but that example is contrived to need a mutex. The signalling condition uses a global variable: Lock associated mutex and check value of a global variable.

Now consider the case of a signalling condition that doesn't involve a global variable. Then what use is the mutex?

As for the performance characteristics of mutexes. If I am misinformed, then so is half of the internet it seems.

With regard to your TCP/IP example. It is understandable that mutexes are not a bottleneck where IO is involved. Even the 300 to 400 cycles that it takes to make a ring3 - ring0 - ring3 transition pale when compared to IO waits.

But user-space only, atomic instructions, even with barrier instructions are much faster for memory to memory operations.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
  • Comment on Re^2: OT: Locking and syncing mechanisms available on *nix.

Replies are listed 'Best First'.
Re^3: OT: Locking and syncing mechanisms available on *nix.
by Illuminatus (Curate) on Mar 28, 2011 at 02:21 UTC
    Well, I can't speak for Windows (which, I gather from your posts over time you are an expert in), but for Linux, I believe you are going to find the scheduler manipulation is going to be more overhead than the signaling you choose. If you really only want a thread to get to a certain point, and then wait until it is told to continue, then you can use suspend/continue. The first thread can call pthread_suspend on itself, which will take it off the run queue (so it no longer takes any CPU). Then the second thread can use pthread_continue to 'signal' it to continue. Your only overhead will be manipulation of the queues in the scheduler, which will have to happen regardless of your mechanism.

    fnord

      Usage scenario. Most of the time, the producer will be running on one core and the consumer on another, and they will producing and consuming from their respective ends of the shared memory structure as fast as they can go. No locking; no synching; no (elective) context switching.

      Occasionally, one end or the other will get preempted for some higher priority thread. At this point, the shared data structure will become either full, or empty depending upon which end is still running. At that point, that end needs to enter a wait state until the other end gets another timeslice, does its thing, relieving the empty or full state and waking up the other end to continue.

      Most of the time, given a correctly sized, and well-written buffering data-structure, the above scenario is both lock-free, wait free and requires no system calls (ring3/ring0/ring3 transitions). Both consumer and producer threads are free to run as fast as their processing requirements allow them and utilise their full timeslices. The latter point is the key to maximum utilisation.

      If I use suspend/resume, buffer empty/full conditions are guaranteed to not only require a multiple calls into the kernel, but also (at least one) very expensive context switch. If I use cond_vars and (unneeded) kernel mutexs, this also means an expensive call into the kernel for every read & write.

      The whole point of lock-free & wait-free algorithms is that they avoid both: expensive calls into the kernel; and expensive elective context switches--ie. non-pre-emptive ceding of the cpu--in order to make full use of each time-slice allotted.

      The point of Fast, user-space mutexes is that they run in user-space, and are therefore faster.

      The (lock-free/wait-free) algorithms are getting better and better defined. The hardware support (CAS, XCHG and similar SMP atomic instructions) is getting better and better with every new generation of processors.

      The limitations are currently locking, syncing and signalling mechanisms designed for single-processor/core IPC purposes. Given that much of the HPC research is done on *nix boxes of one flavour or another, I know there are better mechanisms out there. This thread was meant to be about enlisting help to find them, not argue about whether they are possible, or even required.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        One last note and I'm done. I'm not sure which half of the internet you are looking at regarding mutex's, but there is a lot of stuff that comes up on searches that is fairly old. The Linux thread implementation was completely rewritten for 2.6 (about 6 years ago). I did not have a chance to look at the conditional var code, but I looked at the mutex code for more recent libc releases (2.12), and pthread_mutex_lock runs completely in user-space unless contention occurs. Only then does it go to the kernel to block the thread. You can also use pthread_mutex_trylock to see if contention would occur, and this also runs completely in user-space. This is mentioned in the NPTL white paper on p8-9.

        The futex API is exposed for these implementations, but filled with caveats about 'you better know what you are doing'. This approach would limit your compatibility to Linux.

        Lastly, I found this list of links to implementations of the kind of algorithms you are looking for. I took a look at the atomic_io in qprof, and it looks fairly complete, if primitive.

        It is likely that all of these will give you portability issues if you really need *nix flexibility

        fnord