Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^3: Useful number of childs revisited [SOLVED]

by Laurent_R (Canon)
on May 08, 2015 at 22:20 UTC ( [id://1126158]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Useful number of childs revisited [SOLVED]
in thread Useful number of childs revisited [SOLVED]

In that case, your process isn't either of those. It's just regular mixed processing; and in reality as its extracting a very large volume of data, it probably qualifies as memory-bound.
BrowserUk, thank you for your comment. Yeah, I guess "regular mixed processing" is a pretty fair description of our data extraction processes, but, still, most are usually more IO-bound than anything else. A few of them are algorithmically more complex or may for example require a certain amount of sorting, so these might be more mixed processing and possibly also memory-bound to a certain extent.

But I do not think that "memory-bound" is right for most of our processes. Basically (and with some simplification, because there is quite an amount of business logic involved), the typical application we are running is doing something like this: reading every active subscriber from the database subscriber table; for each of these cell phone subscribers, going into two or three dozens other tables to look for specific more detailed information about the billing services, network services, supplement services, commercial segment, rate plan applicable, prepaid amounts, last bill date, next bill date, etc., for this subscriber, and, once all the relevant information about this subscriber has been collected, write it into a CSV flat file that will be used for further processing later on.

The CSV line we are writing for one subscriber rarely exceeds several hundred bytes (a few thousands at most), but it still leads to large data volume, because there are about 35 million subscribers to be extracted, so that we are producing files ranging from several GB to tens of GB.

At no point in this process do most of these programs use directly a lot of memory. Very little in fact, usually. But, as I mentioned in my previous post, the underlying database engine and the system may use quite a bit of memory for IO buffering, data caching, transaction maintenance and so on, but these are things on which we have only limited or no control.

Having said that, there are some exceptions and some of our programs need to load a lot of reference data into memory (at least three parameter table associated with the call rating engine exceed one million records), but these programs are very different and do not require parallel processing, because they don't scan the full customer database but usually reprocess files (error calls, unallocated calls, error logs) whose sizes never exceed half a GB and are usually much smaller (typically a few MB).

Also note that I discussed only one of our regular activities on one specific platform (two servers), we are doing many other things on other platforms, other applications and other OS's, but this activity is more or less the only one (that I know of) in which we really need to fine tune as much as we can a lot of parallel processing to improve performance.

Je suis Charlie.
  • Comment on Re^3: Useful number of childs revisited [SOLVED]

Replies are listed 'Best First'.
Re^4: Useful number of childs revisited [SOLVED]
by BrowserUk (Patriarch) on May 09, 2015 at 04:04 UTC
    But I do not think that "memory-bound" is right for most of our processes.

    Well, I was only talking about the one process I thought you were describing (extracting a very large volume of data) and I did indicate it was a guess based upon your description, by using the word "probably"; so no biggy.

    The thing I was getting at is that XXX-bound: means that performance (measured in wall-time not cpu time) is constrained by XXX. That is:

    1. CPU-bound: If you could insert a faster processor -- with nothing else changed -- the process would complete sooner.
    2. IO-bound: If you could speed up IO -- with nothing else changed -- the process would complete sooner.
    3. Memory-bound: If you added more memory -- with nothing else changed -- the process would complete sooner.

    With DB apps (often; not always), things get muddy because: a) they often have bursts of both cpu-intensive & IO-intensive processing; b) often a lot of the cpu-intensive work (searching/selecting/sorting/grouping) is done by a completely different process (the DB engine); often-as-not on a completely different box or even cluster of boxes.

    So when trying to categorise the overall processing (rather than the individual process) for a given task; you have to access the complete end-to-end processing requirements of that task.

    Often, the client application itself would be considered IO-bound because it spends the greatest proportion of its time waiting for IO from the DB.

    But if you consider the overall processing, it might be considered cpu-bound because the IO-rate from the DB is limited by cpu-bound calculations within the DB.

    But finally, the overall thing might be memory-bound because the DB engines processing requires it break its cpu-intensive calculation into chunks, because the overall dataset it is processing is greater than can be loaded into its memory.

    Bottom line is that the essentially throw-away comment in that other thread, that started this one, are very generic: all else being equal, running M processes, N-at-a-time on an N-core system will be faster than running those same M processes all concurrently; but as I showed above, all-else is very rarely equal.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1126158]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (6)
As of 2024-03-29 15:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found