in reply to Re^9: Problem in Inter Process Communication
in thread Problem in Inter Process Communication

Hi browserUk,

Thanks for your help !!!

Well i have been assigned a task to migrate a regression suite from c++ to perl. The reg suite has a task to fire queries to prod and test and then compare the results.The queries are fired to a few app servers. The no of client which must be replaced in each query is around 10K and the no of queries is around 5K.The result of the query fired can contain 10 K records at times.Firing query generally takes much less time than comparison and this is the primay reason y i want to segregate things this way. Each query must be replaced by all the clients and then fired onto the app servers

  • Comment on Re^10: Problem in Inter Process Communication

Replies are listed 'Best First'.
Re^11: Problem in Inter Process Communication
by BrowserUk (Patriarch) on Aug 22, 2008 at 09:26 UTC
    Firing query generally takes much less time than comparison and this is the primay reason y i want to segregate things this way.

    Actually, that's a very strong argument for not splitting your workers. Let's assume that a query takes 1/10th the time of a comparison. If you have 10 Queriers, then you would need at least 100 Comparers to keep up, and that's if the communications and data transfers between them took no time at all, which isn't the case. So now you have 100 Comparers each holding 10k of data. Except that transfering data from one place to another is going involve duplication which can double or treble the memory consumption.

    And if the Comparers are slower than the Queriers, then they latter are going to run away, stacking up work for the former and filling memory. So now you have to consider adding semaphores to interlock the queues and prevent runaway and memory meltdown. And that adds complexity with the need for synchronisation, and the risk of deadlocking and all that nasty stuff.

    On the otherhand, if you stick with one type of worker that picks a work item, fires the query, retrieves that data, performs the comparison, cleans up and goes back for the next work item. It's a very straight forward linear flow. Things can never get out of sync. You can never get runaway. To do the work more quickly, you simple start more workers.

    There will be a limit to how many you can run concurrently, defined either by the memory available, the cpu power required for the processing, or the IO bandwidth. Which limit you will hit first will very much depend upon the details of the task and the hardware involved. But with the simple architecture, there is only one variable to adjust. I strongly recommend the simple approach.

    If, once you have that coded and running and can see how it performs, and you find out which limitation forms the boundary to scalability, then on the basis of that knowledge you can consider tweeking the architecture to address it. But with a well configured 12-cpu machine you described elsewhere, trying to guess up front whether your process will be CPU, memory or IO bound at the limit is simply not possible.

    I strongly advise sticking with the simple model. Make it work for 1 worker, and then 2. Once you're absolutely sure that it works correctly in both those configurations, then start ramping the number of workers. Start with 1 thread per cpu and see how that affects your throughput rate. Then try doubling the number and test again.

    Note the throughput rate (from query issued to comparison completed and cleanup performed), memory consumption, and cpu load average at each change of concurrency level. After a few short tests, each of say 40 or 50 completed cycles, you should be able to plot a graph or two that will show you how the numbers vary with the number of threads. And that should allow you to plan the best production run strategy.

    But make it work for one thread and then two threads and fully test it correctness in both scenarios first! I cannot emphasis enough, the importance of making sure it works properly, before you move into a performance testing phase.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Also y can't i use fork to fork multiple processes rather than threads?

        can't i use fork to fork multiple processes rather than threads?

        Yes you can. But again, what benefits are you seeking to gain?

        You could also use a single threaded, single process and non-blocking IO. I wouldn't recommend it especially as you have to discrete parts to your processing, with significantly different processing time requirements: 1 IO bound, 1 CPU bound; but you could do it that way.

        If you had the time and interest, coding the same problem all 3 ways would be an interesting comparative exercise, but people rarely have the time or budget for such explorations.

        The nice thing about the single-queue, worker-pool threaded solution is that it is really easy to get going--you've almost written it three times during these discussions--and once it working, it scales very easily and well.

        It is a great way to prototype something that works and then use that to explorer where the limits and bottle necks are. Once you know whether, at the limits of your hardware, you are IO-bound, cpu-bound, or memory-bound, you can use that information to consider if one of the other solutions would allow you to transcend that boundary.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re^11: Problem in Inter Process Communication
by JoeKamel (Acolyte) on Aug 22, 2008 at 12:28 UTC

    Okay, So, in the vein of what BrowserUK is suggesting, and based upon your better description of the problem, what I would look to do is:

    1. Fire up a fixed number of query threads.
    2. Associate one output Queue with those -- take responses from each of the query threads and pre-pend it with data about which client it came from, etc (used later on)
    3. Fire up a fixed number of comparison threads
    4. Associate an input Queue with each of these
    5. Use the main program as the controller -- to pull data from the query queue, and use the pre-pended data in order direct the flow to the appropriate comparison queue
    6. Optionally associate an output queue with all workers so that they can send their final status (comparison good, bad) back to the main polling loop. note if you do this, you will likely want to use the same sort of magic with pre-pending some control data into the queue data.
    7. This will allow you to do flow control on the queries (so that you don't get too backed up on the comparison workers), as well as to tightly manage when new queries get fired off (no sense in firing off too many outstanding queries if you don't have any comparison workers available). It will also allow you to scale slowly and surely.

    You could get a 1:1 (query to comparison) model going very quickly, then add a couple or few queries to test all of the flow control, and then scale in some more comparison workers.

    Just my $0.02