Re: Useful number of childs revisited [SOLVED]

Replies are listed 'Best First'.
Re^2: Useful number of childs revisited [SOLVED] by BrowserUk (Patriarch) on May 08, 2015 at 18:37 UTC
so that the process is heavily IO-bound, but also in part CPU-bound. In that case, your process isn't either of those. It's just regular mixed processing; and in reality as its extracting a very large volume of data, it probably qualifies as memory-bound. we usually set the queue to a maximum of about 10 processes running in parallel for our 4-CPU server With a mixed mode process; that is the sensible choice as allows for greater utilisation of both resources. Whilst some processes are hogging the CPUs in their cpu-bound sections; other processes can still be making forward process because they have IO completing whilst they are not occupying a cpu. And whilst some processes are waiting for IO to complete, there are other processes that can utilise the cpus that would otherwise stand idle waiting for that IO to complete. But for pure cpu-bound processing, running more processes that there are cpus has the effect of a net increase in overall elapsed time; because it causes more context switches and more cache misses. In an ideal world of 1 process per cpu, those processes will tend to always occupy the same cpus; thus the caches, especially the L1 caches closest to the cpus, will retain the same data across preemptions. And when preemptions occur; as there are no other processes to be run, the same processes just get another timeslice and pick up right from where they left off. It's very rarely an ideal world on a modern OS; there are always lots of other systems processes vying for a cpu; but still it is the case that many of those system processes do very little when they get a timeslot -- often just checking one or two flags or ports before relinquishing the rest of their allotment to the next process. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked	[reply]
Re^3: Useful number of childs revisited [SOLVED] by Laurent_R (Canon) on May 08, 2015 at 22:20 UTC
In that case, your process isn't either of those. It's just regular mixed processing; and in reality as its extracting a very large volume of data, it probably qualifies as memory-bound. BrowserUk, thank you for your comment. Yeah, I guess "regular mixed processing" is a pretty fair description of our data extraction processes, but, still, most are usually more IO-bound than anything else. A few of them are algorithmically more complex or may for example require a certain amount of sorting, so these might be more mixed processing and possibly also memory-bound to a certain extent. But I do not think that "memory-bound" is right for most of our processes. Basically (and with some simplification, because there is quite an amount of business logic involved), the typical application we are running is doing something like this: reading every active subscriber from the database subscriber table; for each of these cell phone subscribers, going into two or three dozens other tables to look for specific more detailed information about the billing services, network services, supplement services, commercial segment, rate plan applicable, prepaid amounts, last bill date, next bill date, etc., for this subscriber, and, once all the relevant information about this subscriber has been collected, write it into a CSV flat file that will be used for further processing later on. The CSV line we are writing for one subscriber rarely exceeds several hundred bytes (a few thousands at most), but it still leads to large data volume, because there are about 35 million subscribers to be extracted, so that we are producing files ranging from several GB to tens of GB. At no point in this process do most of these programs use directly a lot of memory. Very little in fact, usually. But, as I mentioned in my previous post, the underlying database engine and the system may use quite a bit of memory for IO buffering, data caching, transaction maintenance and so on, but these are things on which we have only limited or no control. Having said that, there are some exceptions and some of our programs need to load a lot of reference data into memory (at least three parameter table associated with the call rating engine exceed one million records), but these programs are very different and do not require parallel processing, because they don't scan the full customer database but usually reprocess files (error calls, unallocated calls, error logs) whose sizes never exceed half a GB and are usually much smaller (typically a few MB). Also note that I discussed only one of our regular activities on one specific platform (two servers), we are doing many other things on other platforms, other applications and other OS's, but this activity is more or less the only one (that I know of) in which we really need to fine tune as much as we can a lot of parallel processing to improve performance. Je suis Charlie.	[reply]
Re^4: Useful number of childs revisited [SOLVED] by BrowserUk (Patriarch) on May 09, 2015 at 04:04 UTC
But I do not think that "memory-bound" is right for most of our processes. Well, I was only talking about the one process I thought you were describing (extracting a very large volume of data) and I did indicate it was a guess based upon your description, by using the word "probably"; so no biggy. The thing I was getting at is that XXX-bound: means that performance (measured in wall-time not cpu time) is constrained by XXX. That is: CPU-bound: If you could insert a faster processor -- with nothing else changed -- the process would complete sooner. IO-bound: If you could speed up IO -- with nothing else changed -- the process would complete sooner. Memory-bound: If you added more memory -- with nothing else changed -- the process would complete sooner. With DB apps (often; not always), things get muddy because: a) they often have bursts of both cpu-intensive & IO-intensive processing; b) often a lot of the cpu-intensive work (searching/selecting/sorting/grouping) is done by a completely different process (the DB engine); often-as-not on a completely different box or even cluster of boxes. So when trying to categorise the overall processing (rather than the individual process) for a given task; you have to access the complete end-to-end processing requirements of that task. Often, the client application itself would be considered IO-bound because it spends the greatest proportion of its time waiting for IO from the DB. But if you consider the overall processing, it might be considered cpu-bound because the IO-rate from the DB is limited by cpu-bound calculations within the DB. But finally, the overall thing might be memory-bound because the DB engines processing requires it break its cpu-intensive calculation into chunks, because the overall dataset it is processing is greater than can be loaded into its memory. Bottom line is that the essentially throw-away comment in that other thread, that started this one, are very generic: all else being equal, running M processes, N-at-a-time on an N-core system will be faster than running those same M processes all concurrently; but as I showed above, all-else is very rarely equal. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked	[reply]
Re^2: Useful number of childs revisited [SOLVED] by karlgoethebier (Abbot) on May 08, 2015 at 18:49 UTC
Thank you Laurent for sharing your secret knowledge. "You can't say that" he said "It doesn't, and you can't, I won't, and it don't it hasn't, it isn't, it even ain't, and it shouldn't it couldn't" He told him, "No, no, no" I told him, "Yes, yes, yes" I said, "I do it all the time Ain't this boogie a mess"? (FZ) Anytime i put my hands on this stuff i feel a bit like this. And to be honest: this is one of the reasons why i always blenched from using threads. Best regards, Karl �The Crux of the Biscuit is the Apostrophe�	[reply]
Re^3: Useful number of childs revisited [SOLVED] by Laurent_R (Canon) on May 10, 2015 at 00:23 UTC
Hi Karl, Hmm, I answered to your post more than 24 hours ago (immediately after my answer to [BrowserUk}, but it seems that I forgot to actually post after preview. I am not sure what I should understand with "secret knowledge", especially in view of the link. There is nothing secret there, I was only reporting results obtained in our team after many lengthly tests and benchmarks, nothing more. They only make sense for what our processes do, and probably not for different processes. As I said, I still thought that these results were not off-topic and were relevant. Sorry if they were off-topic and irrelevant. Having said that, I am puzzled with your answer, that's why I ask, but by no mean do I find myself offended by it. Je suis Charlie.	[reply]
Re^4: Useful number of childs revisited [SOLVED] by karlgoethebier (Abbot) on May 10, 2015 at 10:23 UTC
"...I am puzzled with your answer..." Dear Laurent, No need to worry - it was just self-irony. But every joke has a serious background: the uninitiated tend to name things that they don't understand intuitively as "secret knowledge". This applies to any m�tier. With closer examination "secret knowledge" is simply the result of long, hard and systematic work: "...I was only reporting results obtained in our team after many lengthly tests and benchmarks..." In other words: before intuition there is a lot of ~~iteration~~ repetition. Please see also Re: poll ideas quest 2015. I'm working hard to take myself not too serious. I hope that helps ;-) Edit: Minor change of wording. My very best regards, Karl �The Crux of the Biscuit is the Apostrophe�	[reply]


Welcome to the Monastery
	PerlMonks