Re: multithreads newbie question

daverave:

There are many ways you can manage code in threads. Since you have a bunch of dependency relationships in your subroutines, though, I'd suggest a worker/task model. In it, you have a set of worker threads, and they simply pull jobs off the queue. Your queueing logic needs to handle the dependencies somewhere. I'd suggest something simple like:

my $quit = 0;
my %tasks = (
   sub_1 => { code=>\&sub_1,
              state=>'idle', 
              deps=>[ ]
   },
   sub_2 => { code=>\&sub_2,
              state=>'idle',
              deps=>[ ]
   },
   sub_3 => { code=>\&sub_3,
              state=>'idle',
              deps=>[ 'sub_1' ]
   },
   sub_4 => { code=>\&sub_4,
              state=>'idle',
              deps=>[ 'sub_1' ]
   },
   sub_5 => { code=>\&sub_5,
              state=>'idle',
              deps=>[ 'sub_2' ]
   },
   sub_6 => { code=>\&sub_6,
              state=>'idle',
              deps=>[ 'sub_2' ]
   },
   sub_7 => { code=>\&sub_7,
              state=>'idle',
              deps=>[ 'sub_1', 'sub_2' ]
   },
);

sub sleep {
    # No work to do, so doze for a while
    ...
}

sub get_next_task {
   my $task;
   for my $cur (keys %tasks) {
      next unless $tasks{$cur}{state} eq 'idle';
      my $deps_not_ready = 0;
      for my $dep (@{$tasks{$cur}{deps}}) {
         if ($tasks{$dep}{state} ne 'done') {
            ++$deps_not_ready;
         }
      }
      next unless $deps_not_ready == 0;
      return $task;
   }
}

sub thread {
    while (! $quit) {
       my $task = get_next_task();
       if (defined $task) {
          $tasks{$task}{state}='busy';
          $tasks{$task}{started}=time;
          if (&{$tasks{$task}{code}}()) {
             $tasks{$task}{state}='done';
          }
          else {
             $tasks{$task}{state}='FAULT';
          }
          $tasks{$task}{finished}=time;
       }
       else {
          # No tasks are available right now
          sleep();
       }
    }
}
[download]

The preceding (untested!) code is just a description of how I'd approach your problem. It's untested *and* has a race condition in it: Specifically, if a task switch happens at an inopportune time, then multiple threads could start processing the same task. You'll need to put an interlock (such as a mutex) in there somewhere. (For simplicity, I'd put something like a spinlock at the top of get_next_task and allow only one thread at a time to get a task from the list.)

If your dependency tree is complete, you can even reduce the set of tasks. There's no reason you couldn't have the first task execute sub_1, sub_3 and then sub_4, for example. That would remove two entries from %tasks.

The reasons I like this particular approach are:

You can easily adjust the number of threads without worrying about rearranging the dependency tree.
If you use some persistence, you can even pause/stop the scheduler and resume later.
I've already implemented it once, so I can reuse some old code... ;^)
Editing the dependency tree is simple.
Since the task states are maintained in the structure, it's easy to build a reporting screen showing the progress of the system.

If you use this idea, feel free to post the finished code when you're done. That way, I can use it in the future. (My version was in C#, and it might be handy to have it in perl some time...)

...roboticus

Comment on Re: multithreads newbie question Download Code

Replies are listed 'Best First'.
Re^2: multithreads newbie question by BrowserUk (Patriarch) on Aug 10, 2010 at 12:54 UTC
Oh those good ol' bad ol' days :)	[reply]
Re^2: multithreads newbie question by daverave (Scribe) on Aug 10, 2010 at 13:24 UTC
roboticus, I wanted to thank you for your kind attention. I'm going with BrowserUK solution since it seems to fit me demands and be simpler but it was really nice to learn how to use a worker/task model. Thanks!	[reply]