in reply to Re^2: baton passing threads and cond_signal
in thread baton passing threads and cond_signal

As you posted it, the code works fine for me.

When using threads (as when using unicode or the other fairly recent additions to perl) you should probably run the latest (stable) version of perl and the relevant modules.

I'm running perl 5.8.8 on debian linux, with threads version 1.62 (a few months old) and threads::shared version 1.12 (that's the current version) - note: the threads modules included in the standard perl 5.8.8 distribution are much older than that.

update: yes I'm running this on a core-2 duo intel machine (32 bit linux), so that would work like a multi-processor machine.

  • Comment on Re^3: baton passing threads and cond_signal

Replies are listed 'Best First'.
Re^4: baton passing threads and cond_signal
by Anonymous Monk on Aug 22, 2007 at 02:55 UTC

    I have the same versions of everything, 5.8.8, threads 1.62, threads::shared 1.12, (and I reinstalled everything to be sure) and without the yields or sleep 0, it locks up within a couple of seconds. The differences are I'm on XP, and on a single processor.

    It would be good to know which is the cause. The processor or the platform?

    Thanks for your kind help.

      The problem may be in differences in thread implementation on unix and windows. If, for example, cond_broadcast() "nests" in one implementation and not in the other, you may get cascading broadcasts. i.e.

      Say you've got 10 threads waiting on the same condition like in your code. When you broadcast, you'll wake up all threads one after another. But in the mean time, one of the threads (the one who's thread id matches the baton) will broadcast. So now you're waking up all threads again while you're not done waking up the rest of the threads, and so on. That might grind the program to a halt fairly quickly if the broadcasts do not terminate when a new broadcast is done on the same condition variable.

      Now I'm not sure if that's what happens. If it is, I suspect you'd see the process taking up a lot of CPU time, while if the problem is some kind of deadlock, you'd see the process taking essentially no CPU time at all.

        This is very definitely a deadlock problem rather than cascading broadcasts. First, the cpu usage drops to 0, but also, I've been testing with a version that passes the baton back and forth between just 2 threads using cond_signal, and it exhibits exactly the same behaviour. Without I inject a some extra context swaps with yield, it hangs almost immediately.

        Having added some very lightweight tracing, I can see that the problem is that sometimes the signal or broadcast is just missed.

        This seems to be a consequence of this from the pod.

        If there are no threads blocked in a cond_wait on the variable, the signal is discarded.

        If first thread signals, but gets interrupted before it can loop back to re-lock the variable and reenter the wait, then the second thread will wake up, do its thing and then signal when there is no thread waiting, so the signal is discarded. Now both threads enter cond_wait and wait for a signal that will never come.

        I do not see any way around this?

        On a multi-cpu system the threads are less likely to be interupted before they reach the wait so the problem doesn't occur.

        I suspect that without the yields, the code above would also lock up quite quickly if run on a single processor under Linux.

        It may happen less frequently as the mutex and semphore used by cond_wait will be manipulated by highly tuned kernel code within each call to the pthreads libraries, rather than mutliple calls to several different kernel routines as is the case of the windows emulations.

        Less frequently, but it still leaves a latent bug waiting to occur.

        The problem remains one of how do you ensure that at least one thread in the broadcast version, or the other thread in the signal version, always get back to a cond_wait state, before the next thread to be woken, signals?