in reply to Re^2: Using threads to run multiple external processes at the same time
in thread Using threads to run multiple external processes at the same time

Imagine you have 10 units of work each needing 1,000 seconds to complete. You need to spend a total of 10,000 seconds of "work" (CPU seconds).

You can only divide efficiently into as many CPUs as you have. With 2 CPUs you can finish in 5,000 seconds. With 10 CPUs you can finish in 1,000 seconds. With 100 CPUs you still need 1,000 seconds unless you can further subdivide your unit of work. 90 CPUs will be idle while 10 do the processing.

Also, the OS takes care of the sharing. No matter if you have 2 or 1,000 threads the OS will make sure each of them gets their fair share of time to run. When you have more threads or processes then you have CPUs in a system they just fight (in a sense) over who can currently execute. You can only have as many running programs as you have CPUs. Even modern CPUs always run 1 program at a time. It just seems like everything is running at once because they constantly switch between programs extremely fast. :)

  • Comment on Re^3: Using threads to run multiple external processes at the same time

Replies are listed 'Best First'.
Re^4: Using threads to run multiple external processes at the same time
by kikuchiyo (Hermit) on Sep 04, 2009 at 09:24 UTC
    I'm aware of this; that's why I was surprised when I compared the runtimes of the one worker thread vs. two worker threads runs. The processing time was approx 43 minutes for both cases.

    I looked at the output of ps -u while my program was running: in the two threaded case there were two R processes running, both using 99% CPU power, and from the output it seemed that they were dividing the jobs among themselves. Yet, the processing took as much time as in the single threaded case.

    As for the question about messages between threads: The manager thread uses Storable to create work units from the dataset arrays; the frozen arrays are placed on the input queue. These are fairly large.
    The work threads use a second queue to get the results back to the manager thread; however, the results are just hashes with about a dozen keys.

    I'll construct a minimal example and get back to you.
      The manager thread uses Storable to create work units from the dataset arrays; the frozen arrays are placed on the input queue. These are fairly large.

      You shouldn't create the subsets in the main thread and queue them to the workers. This is far too costly.

      Better:

      1. Share the main array so the workers have access;
      2. Queue the subset criteria to the workers;
      3. They dequeue a criteria and use it to create the subsets locally from the shared raw data array.

        As they will only be reading that array, no locking is required.

      4. They generate the subset, pass it to their R instance and wait for the response.

        Unless there is a real need to pass the results back to the main thread, have them finish dealing with them locally before going back for a new criteria set.

      This way, your workers won't be sitting around idle while your main thread is performing the subsetting for all of them. And you won't be churning over costly shared resources by enqueuing and dequeuing large serialised (storable) subsets.

      Let the workers do the work; let the manager sit back and manage :)


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
      Sorry. I mis-understood you. Certainly 1 vs 2 worker threads on a 2 CPU machine should not take the same amount of time. Something is certainly not right in how the work is being divided....