in reply to Re^4: Problem in Inter Process Communication
in thread Problem in Inter Process Communication

Well the kind of model that i have thought off is that the application starts , based on the no of children i can fork( the app starter defines it based on the server load), i fork that many children.

I have to concur that there is no benefit to mixing forks and threads in the way you have. If you want 9 threads to run, starts 9 threads in a single process rather than forking 3 times and starting 3 in each.

I've spent a good while going over the code you've posted, and reading your descriptions of the application, but I still can't make sense of what you are actually trying to do. You've described how you think you should do something, but no real detail of either what you are doing, or why you think you should do it this way.

For example: You start your query threads with a subsection of the work items. The your architecture calls for those threads to process one item then signal the main thread and suspend, whilst the main thread starts another thread to further process the results obtained. And, presumably, once the started thread finishes that further processing, it signals the main thread and dies, and the main thread signals the suspended thread to move onto the next work item.

That's way too complicated and very wasteful of resources. You are using two threads to process each work item, but only one of them can actually run at any given time. And you are going to have to start a second thread (an expensive process) to finish processing each work item, whilst the thread that started processing that work item sits around idle. Not to mention all the complexities of the signalling.

If would be far better to have the worker threads:

  1. pick one item off a shared queue;
  2. perform the query for that item;
  3. perform the comparison for that item.
  4. perform whatever outputting and clean up is required.
  5. Loop back to 1 and process the next work item.

The basic pseudo code for the main thread is:

  1. Create a queue (Thread::Queue).
  2. Start N worker threads passing the queue handle. (Storing the thread handles.)
  3. Push the list of work items (clients) onto the queue.
  4. Push N x undef into the queue (to terminate the threads when there are no more work items.
  5. Call join() on the accumulated array of thread handles. (Thereby blocking until the workers are done).

And basic pseudo-code for the worker threads is:

sub worker { my( $Q ) = shift: while( my $workItem = $Q->dequeue ) { ## Perform query ## Perform comparison ## Perform output/cleanup } }

No signalling, no locking, no forking, no user-explicit sharing, and completely scalable. The queue manages the entire process without any further effort.

Just start with one worker thread until you sure that the processing logic is correct. Then increase the number slowly until you see no further improvement in the throughput. The processing of each item is completely linear, but multiple work items are processed concurrently. Very low complexity, no timing issues or deadlock possibilities.

The only additional complexity I foresee, reading between the lines of your various posts, is that if you are outputting your results to a single file, then you would need to employ a mutex to prevent the output from the worker threads getting interleaved. But that involves just a single shared variable and a simple lock:

## in the main thread: my $outputMutex : shared; ... open OUTFILE, '>', ... ## In the worker threads ... { lock $outputMutex; print OUTFILE ... }

I seriously urge you to consider what benefits you think you will get from mixing forks and threads? Actually, on the basis of the information available so far, you could probably write your application to use either, but mixing the two is completely unnecessary as far as I can tell.

Likewise, what benefit is there in suspending one thread and starting another to finish the processing of single work item? Especially in the light of the cost of starting and discarding use-once threads, and the complexities of the signalling it requires.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^6: Problem in Inter Process Communication
by libvenus (Sexton) on Aug 21, 2008 at 10:53 UTC

    Thanks for your inputs...

    Well i still haven't finalized the strategy yet... i just wanted some inputs .. in my case there are thousands of queries and numerous clients. Each query <CLIENT> placeholder which must be replaced at runtime and then fired and outputs compared . I just wanted to separate the query firing and the comparision processes.

    Also in the model that u have proposed how do i add support for handling multiple queries,i can use the same strategy as i had used before...

    use strict; use threads; use threads::shared; use Thread::Queue; my $maxnoofThreads = 1; my @clientList = qw(client1 client2 client3 client4 client5 client6 + client7 client8 client9 client10 client11 client12); my %hash; my $q = new Thread::Queue; $q->enqueue(@clientList); for(my $i=0;$i<= $maxnoofThreads; $i++) { $hash{$i} = threads->new( \&worker, $q,$i); } $q->enqueue("undef"); foreach my $thr (values %hash) { # Don't join the main thread or ourselves #if ($thr->tid && !threads::equal($thr, threads->self)) # { $thr->join; # } } sub worker { my( $Q ) = shift; my $i = shift; while( my $workItem = $Q->dequeue ) { return unless($workItem); print " workitem from thread $i -->$workItem\n"; ## Perform query ## Perform comparison ## Perform output/cleanup } }
      Also in the model that u have proposed how do i add support for handling multiple queries,i can use the same strategy as i had used before...

      Yes. Second and subsequent queries to the same client will just get picked up by the next available worker.

      However, if it is important to only issue a single query against a given client at any one time, you'd have to take additional steps to ensure that. If that is the case, speak up and I'll see what I can come up with.

      A few comments on your code:

      1. I generally prefer to start the workers, before adding the work items to the queue.

        More habit than for any reason I can remember, but the workers will just block on the queue until there is something to do anyway. And I have vague recollections of it making a difference on at least one occasion.

      2. There is no real benefit to using a hash to store your thread handles.

        An array is simpler and works better.

      3. There is no need to pass an integer to each thread as an identifier.

        You can obtain the process unique thread id using:

        my $tid = threads->self->tid;

        Where threads->self is a class method that returns the thread handle of teh current thread and ->tid is an instance method that return the thread identifier for the invocant thread handle.

      4. undef is an intrinsic value (Not a string as you have it).

        And whilst you were only starting a single thread for now, it better to use the configuration variable $maxnoofThreads to ensure sufficient undefs are stacked to terminate all the threads.

        The construct:

        $q->enqueue( ( undef ) x $maxnoofThreads );

        Says: build a list of $maxnoofThreads undefs and push them onto the queue.

      5. As you only put the handles of the worker threads into the list, there is no need to check whether one of them is the main thread.
      6. There is no need to return unless $workitem.

        Once all the work items have been processed, each thread will dequeue an undef, and the while loop will terminate naturally. and the thread will terminate when it "falls of the end" of the worker sub.

      7. And time you use a shared resource (like printing to the terminal or a file), from multiple threads, you should serialise acccess to that resource.

        Ie. Apply locking to a shared variable.

        On some systems, printing to the screen will be serialised by the OS or runtime, but you cannot rely on that everywhere. Also, if the output is redirected to a file, the automatic serialisation goes away.

      Putting that all together, it looks like this:

      use strict; use threads; use threads::shared; use Thread::Queue; my $mtxStdOut : shared; my $maxnoofThreads = 1; my @clientList = qw( client1 client2 client3 client4 client5 client6 client7 client8 client9 client10 client11 client12 ); my $q = new Thread::Queue; my @workers = map { threads->new( \&worker, $q ); } 1 .. $maxnoofThreads; $q->enqueue(@clientList); $q->enqueue( ( undef ) x $maxnoofThreads ); $_->join for @workers; sub worker { my( $Q ) = shift; my $tid = threads->self->tid; while( my $workItem = $Q->dequeue ) { { lock $mtxStdOut; print " workitem from thread $tid -->$workItem\n"; } ## Perform query ## Perform comparison ## Perform output/cleanup } }

      That's a very lightly customised template that I use for most threaded perl applications. It's flexible enough that it lends itself to many uses and is tested well enough that it just works in most cases.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        One more question on the strategy lets say if i use the boss and worker thread model and make two boss( query and comp ) . The query boss would make a threaded queue and create more workerthreads to work on the query and client.Each workerthread would loop on the clients and after comleting for one client it would dump the output in a file and enqueue the locaiton of the file in a global threaded queue.The comp boss thread has launched x no of workerthreads which are blocking on the global theaded queue.

        Does this model looks feasible+scalable. I know the IO would be increase but is it possible and recommended.

      I completely agree with BrowserUk. A much simpler approach can be used, and the one he proposes sounds good (not that he needs me to say that).

      just a couple more notes, if by "multiple queries", you mean multiple actions on the same server, just pass yourself in an array of the actions you need to perform as part of the queue.

      on that note, to even further simplify, you could actually get away with avoiding all mutex's, locks, etc, by setting up an output queue, and pushing all the results into it thus only the scripts main portion (thread 0) is accessing the file.