in reply to Re^4: Using threads to run multiple external processes at the same time
in thread Using threads to run multiple external processes at the same time

I too am confused by your results. Right now I have 3 speculative possibilities:

  1. Your cpu isn't a real dual-core, but rather one of those pseudo-dual core hyperthreaded things, and your OS is presenting it as if it were a true dual core.
  2. One or more of the R dlls is non-threadsafe, and they've dodged the issue by using a semaphore to serialise access.

    This seems to be the most likely explanation.

  3. There is something inherent in your implementation that is causing the threads to serialise.

    I haven't been able to spot anything from a fairly extended inspection, but there is rather too much code to comprehensively 'run it in my head'.

    Unfortunately, even if I had the data files, I would still not be able to run it here as IPC::Open2 (nor any of the alternatives), don't work worth a damn on my platform.

My best suggestion for isolating which (if any) of the above is the problem, is to log the command output of your demo run, split it into 2 and then start two manual R sessions and feed (pipe) half to each and set them both going at the same time and see how long they take.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
RIP PCW It is as I've been saying!(Audio until 20090817)
  • Comment on Re^5: Using threads to run multiple external processes at the same time

Replies are listed 'Best First'.
Re^6: Using threads to run multiple external processes at the same time
by kikuchiyo (Hermit) on Sep 04, 2009 at 22:43 UTC
    Thanks for looking at it.

    Possibility 1 is out of the question, it's a C2D E7300 - or it's damn good at pretending to be one. :)

    Possibility 3 is also out, I think. I tried to replace the function that talks to R with one that simply calls one of my dumb C Mandelbrot renderers: the 2 threaded version finished in half time, confirming that the CPU is capable of running two processes in a truly parallel way. This also suggests that it is not my implementation that's at fault.

    This leaves possibility 2, which seems to suggest that something fishy is going on with R when it comes to multithreading. Maybe it's as you say that there are two R interpreter instances, but only one backend?

    I'll try to isolate the issue. Thanks again.
Re^6: Using threads to run multiple external processes at the same time
by kikuchiyo (Hermit) on Sep 04, 2009 at 23:27 UTC
    It seems that you were right.

    I found something relevant on the R mailing lists:

    "> Specifically if it is possible to ask R to run a given R-program > from withing a posix thread (on linux) without providing a Mutex tha +t > would serialise access to R process. No. You need to make sure that only one thread calls R, which means ha +ving some sort of handler to queue the commands."

    This means that the threaded approach is useless. Back to square one.
      This means that the threaded approach is useless.

      Hm. I'm not sure that is true.

      It's unclear to me from the 3 posts in that thread whether they are talking about talking to multiple processes from different threads--as you are trying to do--or whether they are talking about talking to R.dll from multiple threads when embeding R in a C/C++ program.

      At one point the OP talks of "calling R", at another "the R process". And most of the "threads" discussion by the 2 experts seems to be talking about threading R internally--ie. within a single R process--rather than having two process instances running concurrently.

      I remember many of the dlls in OS/2 v1.x were inherently thread-unsafe, mostly because they were written in C by ex-COBOL programmers who hadn't quite gotten over the 'static data section' way of thinking. But I didn't think anyone still coding for a living was still doing stuff like that.

      By far the simplest way of verifying this would be run something in each of two concurrent interactive sessions that takes an appreciable amount of time--a minute or two--and see if the time is overlapped or serialised. I have two Rgui sessions running now, but I don;t know enough about R to come up with something that doesn't complete instantaneously :(


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        I'll look into it, problem is that I can do it only when I'm back at work on Monday.

        Meanwhile I'll try to ask around on the R-help mailing list, maybe they will be able to clarify the issue.

        And I think I'll write a "Tarzan-be-strong-Tarzan-makes-other-hole" version of my program that uses a shared directory to distribute workload to networked clients, like I was advised earlier in this thread. I should probably be working on something useful and productive instead, but I'm a stubborn bastard and I really want to solve this problem now.