in reply to Re^3: Using threads to run multiple external processes at the same time
in thread Using threads to run multiple external processes at the same time

I'm aware of this; that's why I was surprised when I compared the runtimes of the one worker thread vs. two worker threads runs. The processing time was approx 43 minutes for both cases.

I looked at the output of ps -u while my program was running: in the two threaded case there were two R processes running, both using 99% CPU power, and from the output it seemed that they were dividing the jobs among themselves. Yet, the processing took as much time as in the single threaded case.

As for the question about messages between threads: The manager thread uses Storable to create work units from the dataset arrays; the frozen arrays are placed on the input queue. These are fairly large.
The work threads use a second queue to get the results back to the manager thread; however, the results are just hashes with about a dozen keys.

I'll construct a minimal example and get back to you.
  • Comment on Re^4: Using threads to run multiple external processes at the same time

Replies are listed 'Best First'.
Re^5: Using threads to run multiple external processes at the same time
by BrowserUk (Patriarch) on Sep 04, 2009 at 10:07 UTC
    The manager thread uses Storable to create work units from the dataset arrays; the frozen arrays are placed on the input queue. These are fairly large.

    You shouldn't create the subsets in the main thread and queue them to the workers. This is far too costly.

    Better:

    1. Share the main array so the workers have access;
    2. Queue the subset criteria to the workers;
    3. They dequeue a criteria and use it to create the subsets locally from the shared raw data array.

      As they will only be reading that array, no locking is required.

    4. They generate the subset, pass it to their R instance and wait for the response.

      Unless there is a real need to pass the results back to the main thread, have them finish dealing with them locally before going back for a new criteria set.

    This way, your workers won't be sitting around idle while your main thread is performing the subsetting for all of them. And you won't be churning over costly shared resources by enqueuing and dequeuing large serialised (storable) subsets.

    Let the workers do the work; let the manager sit back and manage :)


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re^5: Using threads to run multiple external processes at the same time
by bot403 (Beadle) on Sep 04, 2009 at 15:00 UTC
    Sorry. I mis-understood you. Certainly 1 vs 2 worker threads on a 2 CPU machine should not take the same amount of time. Something is certainly not right in how the work is being divided....