Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re: Useful number of childs revisited [SOLVED]

by Laurent_R (Canon)
on May 08, 2015 at 17:57 UTC ( [id://1126130]=note: print w/replies, xml ) Need Help??


in reply to Useful number of childs revisited [SOLVED]

I will talk of OS heavyweight shell processes, not threads, and certainly not Parallel::ForkManager. It even does not have much to do with Perl (although Perl is used in our programs). But I still believe that this is not off-topic and actually quite relevant.

At my job, we are very often extracting data from seven large databases, each database having eight sub-databases. This is not like just table dumping, there is a lot of business logic in this extraction process, so that the process is heavily IO-bound, but also in part CPU-bound. These extraction processes are very long: from 4 to 8 hours for most, up to 3 or 4 days for a couple of them for the full data extraction to complete (and yes, we are extracting a very large volume of data).

What we do is to launch 7 * 8 = 56 processes through a queuing system and maintain the maximum number of active processes at a certain level, the other processes are just pending doing nothing until one slot becomes free for one of them. We have 4 CPU on our server. We found that, usually, the optimal number of processes running concurrently is somewhere between 8 and 12. (We have about 50 different data extraction applications, some are more heavily IO-bound than others, so that the optimal number of processes will vary to a certain extent with the nature of the extraction being run.)

Less than 8 processes in parallel, and the server appears to be underutilized (although we are doing a few other things on this server, it is really essentially dedicated to these heavy extractions tasks). More than 12 processes, and it appears that the overhead of context-switches starts to slow down the overall execution performance (the processes in themselves are not very memory-intensive, but there could be some underlying data caching, buffering and pre-fetching leading to a real memory consumption higher than what we think).

Anyway, in view of that, we usually set the queue to a maximum of about 10 processes running in parallel for our 4-CPU server.

Je suis Charlie.
  • Comment on Re: Useful number of childs revisited [SOLVED]

Replies are listed 'Best First'.
Re^2: Useful number of childs revisited [SOLVED]
by BrowserUk (Patriarch) on May 08, 2015 at 18:37 UTC
    so that the process is heavily IO-bound, but also in part CPU-bound.

    In that case, your process isn't either of those. It's just regular mixed processing; and in reality as its extracting a very large volume of data, it probably qualifies as memory-bound.

    we usually set the queue to a maximum of about 10 processes running in parallel for our 4-CPU server

    With a mixed mode process; that is the sensible choice as allows for greater utilisation of both resources.

    • Whilst some processes are hogging the CPUs in their cpu-bound sections; other processes can still be making forward process because they have IO completing whilst they are not occupying a cpu.
    • And whilst some processes are waiting for IO to complete, there are other processes that can utilise the cpus that would otherwise stand idle waiting for that IO to complete.

    But for pure cpu-bound processing, running more processes that there are cpus has the effect of a net increase in overall elapsed time; because it causes more context switches and more cache misses.

    In an ideal world of 1 process per cpu, those processes will tend to always occupy the same cpus; thus the caches, especially the L1 caches closest to the cpus, will retain the same data across preemptions. And when preemptions occur; as there are no other processes to be run, the same processes just get another timeslice and pick up right from where they left off.

    It's very rarely an ideal world on a modern OS; there are always lots of other systems processes vying for a cpu; but still it is the case that many of those system processes do very little when they get a timeslot -- often just checking one or two flags or ports before relinquishing the rest of their allotment to the next process.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
      In that case, your process isn't either of those. It's just regular mixed processing; and in reality as its extracting a very large volume of data, it probably qualifies as memory-bound.
      BrowserUk, thank you for your comment. Yeah, I guess "regular mixed processing" is a pretty fair description of our data extraction processes, but, still, most are usually more IO-bound than anything else. A few of them are algorithmically more complex or may for example require a certain amount of sorting, so these might be more mixed processing and possibly also memory-bound to a certain extent.

      But I do not think that "memory-bound" is right for most of our processes. Basically (and with some simplification, because there is quite an amount of business logic involved), the typical application we are running is doing something like this: reading every active subscriber from the database subscriber table; for each of these cell phone subscribers, going into two or three dozens other tables to look for specific more detailed information about the billing services, network services, supplement services, commercial segment, rate plan applicable, prepaid amounts, last bill date, next bill date, etc., for this subscriber, and, once all the relevant information about this subscriber has been collected, write it into a CSV flat file that will be used for further processing later on.

      The CSV line we are writing for one subscriber rarely exceeds several hundred bytes (a few thousands at most), but it still leads to large data volume, because there are about 35 million subscribers to be extracted, so that we are producing files ranging from several GB to tens of GB.

      At no point in this process do most of these programs use directly a lot of memory. Very little in fact, usually. But, as I mentioned in my previous post, the underlying database engine and the system may use quite a bit of memory for IO buffering, data caching, transaction maintenance and so on, but these are things on which we have only limited or no control.

      Having said that, there are some exceptions and some of our programs need to load a lot of reference data into memory (at least three parameter table associated with the call rating engine exceed one million records), but these programs are very different and do not require parallel processing, because they don't scan the full customer database but usually reprocess files (error calls, unallocated calls, error logs) whose sizes never exceed half a GB and are usually much smaller (typically a few MB).

      Also note that I discussed only one of our regular activities on one specific platform (two servers), we are doing many other things on other platforms, other applications and other OS's, but this activity is more or less the only one (that I know of) in which we really need to fine tune as much as we can a lot of parallel processing to improve performance.

      Je suis Charlie.
        But I do not think that "memory-bound" is right for most of our processes.

        Well, I was only talking about the one process I thought you were describing (extracting a very large volume of data) and I did indicate it was a guess based upon your description, by using the word "probably"; so no biggy.

        The thing I was getting at is that XXX-bound: means that performance (measured in wall-time not cpu time) is constrained by XXX. That is:

        1. CPU-bound: If you could insert a faster processor -- with nothing else changed -- the process would complete sooner.
        2. IO-bound: If you could speed up IO -- with nothing else changed -- the process would complete sooner.
        3. Memory-bound: If you added more memory -- with nothing else changed -- the process would complete sooner.

        With DB apps (often; not always), things get muddy because: a) they often have bursts of both cpu-intensive & IO-intensive processing; b) often a lot of the cpu-intensive work (searching/selecting/sorting/grouping) is done by a completely different process (the DB engine); often-as-not on a completely different box or even cluster of boxes.

        So when trying to categorise the overall processing (rather than the individual process) for a given task; you have to access the complete end-to-end processing requirements of that task.

        Often, the client application itself would be considered IO-bound because it spends the greatest proportion of its time waiting for IO from the DB.

        But if you consider the overall processing, it might be considered cpu-bound because the IO-rate from the DB is limited by cpu-bound calculations within the DB.

        But finally, the overall thing might be memory-bound because the DB engines processing requires it break its cpu-intensive calculation into chunks, because the overall dataset it is processing is greater than can be loaded into its memory.

        Bottom line is that the essentially throw-away comment in that other thread, that started this one, are very generic: all else being equal, running M processes, N-at-a-time on an N-core system will be faster than running those same M processes all concurrently; but as I showed above, all-else is very rarely equal.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
Re^2: Useful number of childs revisited [SOLVED]
by karlgoethebier (Abbot) on May 08, 2015 at 18:49 UTC

    Thank you Laurent for sharing your secret knowledge.

    "You can't say that" he said "It doesn't, and you can't, I won't, and it don't it hasn't, it isn't, it even ain't, and it shouldn't it couldn't" He told him, "No, no, no" I told him, "Yes, yes, yes" I said, "I do it all the time Ain't this boogie a mess"? (FZ)

    Anytime i put my hands on this stuff i feel a bit like this.

    And to be honest: this is one of the reasons why i always blenched from using threads.

    Best regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

      Hi Karl,

      Hmm, I answered to your post more than 24 hours ago (immediately after my answer to [BrowserUk}, but it seems that I forgot to actually post after preview.

      I am not sure what I should understand with "secret knowledge", especially in view of the link.

      There is nothing secret there, I was only reporting results obtained in our team after many lengthly tests and benchmarks, nothing more. They only make sense for what our processes do, and probably not for different processes.

      As I said, I still thought that these results were not off-topic and were relevant. Sorry if they were off-topic and irrelevant.

      Having said that, I am puzzled with your answer, that's why I ask, but by no mean do I find myself offended by it.

      Je suis Charlie.
        "...I am puzzled with your answer..."

        Dear Laurent,

        No need to worry - it was just self-irony.

        But every joke has a serious background: the uninitiated tend to name things that they don't understand intuitively as "secret knowledge".

        This applies to any métier.

        With closer examination "secret knowledge" is simply the result of long, hard and systematic work:

        "...I was only reporting results obtained in our team after many lengthly tests and benchmarks..."

        In other words: before intuition there is a lot of iteration repetition.

        Please see also Re: poll ideas quest 2015.

        I'm working hard to take myself not too serious.

        I hope that helps ;-)

        Edit: Minor change of wording.

        My very best regards, Karl

        «The Crux of the Biscuit is the Apostrophe»

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1126130]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (4)
As of 2024-04-26 04:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found