libvenus has asked for the wisdom of the Perl Monks concerning the following question:
Hi Monks,
which amongst the following is a better strategy considering the application i m builing would fire multiple queries to a App Prod/test server ( App server can spawn mutiple children to work parallely) compare the results( can contain 10 K records each for prd and test).The comparison is a complex and would take time :-
Single process multiple threads using the threaded queue containg the workitem - queries.
Single parent process spawns multiple children to work on th queries parallely using parallel::forkManager.
sidering the application that i m desinging which is a better approach.
Thanks
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: best strategy
by tilly (Archbishop) on Aug 25, 2008 at 06:38 UTC | |
For an extreme example, if you're using Windows and are expecting to bottleneck on local CPU on a 1-CPU machine, you absolutely should make this job a single process, that is single-threaded. Suppose that you're bottlenecked on network time delays and there is an Oracle database connection needed per worker. Then you really want several persistent workers. Single process, multiple threads would beat constant forking. Suppose that you're bottlenecked on disk seek time, you're on a Unix system, and there are no startup costs. Then I would recommend the fork approach.
Suppose that you're bottlenecked on Every one of these solutions and more have been successfully used. Every one has advantages and cases where it is best. Anyone who gives you an absolute answer saying that one of them is always the right way to go doesn't know what they are talking about. I didn't really answer your question. But hopefully I gave you enough to think about that you can have a better chance of coming up with the right solution for your situation. Oh, and I gave you a few more options to consider. :-) Update: I messed up one of my examples. If you're bottlenecked on network round trips then a single machine should be able to run enough copies to move the bottleneck to the server on the other end. In which case there is no need to complicate things with the cluster. But if CPU is your problem then you would want to split up work onto multiple machines. | [reply] |
by libvenus (Sexton) on Aug 25, 2008 at 07:37 UTC | |
What operating system are you running on ? unix flavour What are you expecting to be your performance bottlenecks? processing speed and Memory utilizationWhat kind of hardware are you working on? minimum CPUS available 4 max - 12 Are there significant initialization costs and How much data needs to be passed around i have to read many queries which are into very big files around 500 in no.The output of queries can also be bulky.Then i need to compare them.Maximizing speed with minimum memory overhead is what i m trying to achieve Is there any possibility of moving this to a cluster? not sure right now... Well i have received some valuable advice from various monks in the thread " Problem in Inter process Communication" though i still cannot decide | [reply] |
|
Re: best strategy
by BrowserUk (Patriarch) on Aug 25, 2008 at 09:55 UTC | |
One cannot help but think that at this point your best strategy would be to actually start writing some code. In the time since you first asked this question, you could have written prototypes of both a forking and a threaded solution and now be in a position to do some empirical tests to determine which works best on your particular setup. With a little care, the subroutines for issuing the queries, and comparing the results, should be reusable by both prototypes without change. You have already written code for the threading infrastructure. Knocking up a forking equivalent using Parallel::ForkManager should be relatively simple. Once you have both, you will be in a position to make some real progress on deciding which is going to work best in your environment as well as deciding if moving to a Perl solution is really going to produce any benefit over your exists C++ solution. On the basis of the accumulation of the sparse information you've provided spread across your 3 threads on this subject, my gut feel is that a threaded solution will be most flexible and efficient, as your comparisons seem to be consuming the bulk of the time, using a reusable pool of workers will have less startup overhead and cause least memory thrashing. It will also require the least amount of infrastructural overhead to control the asynchronicity. But, given your lack of information regarding the performance of the hardware setup--network bandwidth and latency--along with the spread of inherent hardware parallelism available--from 4 to 12 cpus--and the only measure of where the bottlenecks of the existing system lie, being hearsay that "the comparison is where most of the time is spent", attempting to draw any conclusions is only ever going to be speculation. The only ways you are going to come up with any definitive answers is And the latter approach will be quicker to do; require less in-depth knowledge; and provide the most accurate results. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
by libvenus (Sexton) on Aug 25, 2008 at 12:32 UTC | |
well i have done exactly that, though i m unable to use Parallel::Forkmanager as it is not available(in my env) Using threads
Using multiple processes
| [reply] [d/l] [select] |
by BrowserUk (Patriarch) on Aug 25, 2008 at 13:20 UTC | |
I hate to say this!. But if as you say, your job is dependant upon your solution to this project, I seriously suggest that you seek help from a local mentor with access to the code, data and hardware. Everything about the code you've posted, Suggest to me that you do not have the experience to tackling a project of this nature when your job is on the line as a result. I seriously wish you the very best of luck, but you need more help than can reasonably be provided through a forum such as this. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
by Zen (Deacon) on Aug 25, 2008 at 14:36 UTC | |
by BrowserUk (Patriarch) on Aug 25, 2008 at 14:48 UTC | |
| |
by Corion (Patriarch) on Aug 25, 2008 at 12:49 UTC | |
Regarding Parallel::ForkManager not being "available" - what restriction keeps you from installing/using it? Yes, even you can use CPAN. | [reply] |
|
Re: best strategy
by moritz (Cardinal) on Aug 25, 2008 at 07:37 UTC | |
| [reply] |