in reply to Thread Synchronization Mechanism

Be aware, threads do not run concurrently unless you have multiple cpus.

Even using a non-polling event (such as cond_broadcast), each of the threads will be started sequentially, but there is no guarentee of what order they will be started.

When each thread receives it's signal to start, it will the run for a complete timeslice (barring yields). The timeslice will vary from OS to OS, but typically might be somewhere between 3 and 20 milliseconds. With 10 threads, that means the last thread will have no chance of starting until at least .03 of a second after the first. Modern cpus can process a huge amount of data in this time. A crude test with a 2GHz cpu shows perl can iterate a loop incrementing a scalar 1_000_000 times or perform at least 500_000 exponentiations in this time.

Finally, there is no guarentee--in fact it is extremely unlikely--that your pool of threads will run on adjacent timeslices. Remember that the threads in your process are competing with every other thread in the system for timeslices, including all the high and real-time priority threads which will take presedence. In fact, it is almost impossible that your threads will get even numbers of timeslices. The usual pattern is that some threads will tend to dominate, and the other threads will be starved of CPU until one or more of the dominate threads terminates or goes into IO wait states.

The bottom line is that there is simply no way to synchronise threads. Indeed, any design that tried to synchronise threads is fundementally flawed.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail

Replies are listed 'Best First'.
Re: Re: Thread Synchronization Mechanism
by pbeckingham (Parson) on May 23, 2004 at 02:47 UTC

    My stated desire for synchronized threads should really have relaxed the criteria somewhat - the need is really to keep the threads away from the task at hand until a certain time. I should have stated it differently.

    Regarding your comment on threads in a pool getting starved of cpu, are you talking about significant differences of, say 20% differences in total slices, or something very subtle?

      Okay. The problem is that I have often seen people construct programs that spawn multiple threads of execution only to go on to try and orchestrate those threads. They end up with hugely complicated sets of locks, states and counters and yields that are a nightmare to balance, and need to be re-balanced after every minor change in the program and, often as not, for each cpu or set of hardware they run on. These type of programs are usually better served using normal event driven/ state machine techniques.

      The following program and output shows one way of 'holding' the threads until your ready for them to do some work. The combination of the shared variable $signal, the 'wait' block at the top of the threads, and cond_broadcast( $signal) effect the type of control your looking for.

      Code and typical output

      Note that you cannot predict which order the threads will run in; see the apparent randomness of the thread ids as they start/wait/stop and the presence of many 0's in the counters to start.

      With the tight thread loop (no yielding), each thread tends to run for extended bursts. Note also the unevenness of the runs, especially at first, and also, that after a couple of timeslices, the main (logging) thread seems to be starved of cpu until the other threads complete.

      I believe that the latter effect is a result of dynamic scheduling by the OS. At first it may have 'foreground boost', but when it doesn't make full use of it's first few timeslices, it's priority is dynamically lowered slightly and so takes a back seat until the other threads complete. This is a 'once informed' speculation, but it fits the empirical evidence and some out-of-date knowledge. It is also likely to vary wildly with different OS's and even configurations. Even Workstation and Server configurations of the MS OS's (NT/2000/XP) have different scheduling schemes. I have no knowledge of what, if any, dynamic prioritization is done by *nixes. It is probably configurable through a re-build or maybe even runtime. And the defaults will likely vary from distribution to distribution.

      You can achieve a measure of 'even-handedness' between threads by employing yield() and a counter to try and balance out the timeslices between threads. Try running the above snippet with a command line parameter of -YIELD=1. The effect is that you will see that the forground thread gets in to do it's logging with much greater frequency, and that each of the threads appears to be getting it's 'fair share' of the cpu.

      Note that it is still non-determanistic in that you will not be able to know or contril which thread gets the next or subsequent time slices, but it does reduce the 'bursty' behaviour somewhat.

      However, this has already moved into the realms of 'attempted orchestration' which is fraught with problems and is only as effective as the output you will see seems to imply, because each of the threads is simple and identical. Throw in a thread that starts doing something different, like IO, and you will see dramatically different results in most cases.

      Finally, it is usually better to have your threads block on something that denotes the work they have to do, rather than an artificial signal. The Thread::Queue $Q->dequeue() function is a perfect implementation of this.

      1. You create the queue before starting the threads.
      2. Each thread starts and immediately trys to dequeue a piece of work. The queue is empty, so it blocks.
      3. The main (or other) thread then enqueue()s work as it becomes available, and one of the blocking threads unblocks, processes it and the goes back to waiting for some more.

      If you have multiple types of work that need to be processed in a particular sequence, use seperate queues for each. The whole chain then becomes data driven rather than managed. It works much better that way.

      Runs and output done under Win32XP SP1 and perl 5.8.3 (AS809). The results may be entirely different under other OS's and any conclusions drawn from my observations are only applicable to my view of my system. YMMV.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail