in reply to Re^4: Using threads to run multiple external processes at the same time
in thread Using threads to run multiple external processes at the same time

The manager thread uses Storable to create work units from the dataset arrays; the frozen arrays are placed on the input queue. These are fairly large.

You shouldn't create the subsets in the main thread and queue them to the workers. This is far too costly.

Better:

  1. Share the main array so the workers have access;
  2. Queue the subset criteria to the workers;
  3. They dequeue a criteria and use it to create the subsets locally from the shared raw data array.

    As they will only be reading that array, no locking is required.

  4. They generate the subset, pass it to their R instance and wait for the response.

    Unless there is a real need to pass the results back to the main thread, have them finish dealing with them locally before going back for a new criteria set.

This way, your workers won't be sitting around idle while your main thread is performing the subsetting for all of them. And you won't be churning over costly shared resources by enqueuing and dequeuing large serialised (storable) subsets.

Let the workers do the work; let the manager sit back and manage :)


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
RIP PCW It is as I've been saying!(Audio until 20090817)
  • Comment on Re^5: Using threads to run multiple external processes at the same time