System Performance

Massyn has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: System Performance by BrowserUk (Patriarch) on Aug 23, 2007 at 04:21 UTC
The problem is that whilst the Regatta has 32 processors and a huge memory bandwidth, your single tasking benchmark will only utilise a tiny fraction of all that power. It will only ever run on one of the POWER4 processors which were introduced in 2001 and run at a now stately 1.1 or 1.3 Ghz. There is a whole sub-industry devoted to constructing, maintaining and running specialised benchmarks for this kind of hardware to generate headline grabbing numbers. If you could run one of those on your home machine, it would fair very badly by comparison, despite being (I'm guessing) two or three Moore's generations newer hardware. Generalised hardware benchmarks are generally pretty useless. A slower (Ghz) machine with a top of the range video card will out perform a faster machine with a bad one. Benchmarks only work to the extent that they reflect the realistic operations that they are a substitute for. And a simple, single-tasking Perl script doing a little repetitive math won't begin to exercise the potential of even a dual-core or dual processor system, let alone the kind of two-cores per die, 4 dies to a card, 4-cards to a box machine like the p690 with its "variable frequency 'distributed switch', wave-pipelined expansion bus". Its another indication that the future is multi-tasking and that languages that rely on the programmer to partition their algorithms, and use fork and pipes or sockets to distribute and coalesce the data, are doomed to disappear. Take your prime sieve as an example. It is almost impossible to distribute the processing of a sieve across processors using fork. But its easy to set multiple threads running, that increment the shared candidate counter and then scan the shared sieve array 'striking off' multiples of their candidate. Not a convicing example? Then consider manipulating very large digital images, say digital X-rays or CAT or PET scans. Or searching and matching huge strings like genome sequences. Threading is coming, like it or not. It's just a matter of which languages are going to make using them, easiest. (And sorry for hijacking your question to grind my axe :) Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply]
Re^2: System Performance by Eyck (Priest) on Aug 23, 2007 at 07:18 UTC
threads are known to work well only on middle-ground 2-4 cpu machines, later they start hitting memory locking wall, and when it comes to things "coming", most of the industry thinks that things like transaction memory is the thing that's coming. Not to mention that when you look at efficient multi-cpu systems outthere you start noticing designs like erlang ( granted, what they're using is lightweight threads, but that's not the power of their solution) Threads are here, they are ugly, and we're growing out of them pretty fast.	[reply]
Re^3: System Performance by BrowserUk (Patriarch) on Aug 23, 2007 at 07:41 UTC
... most of the industry thinks that things like transaction memory is the thing that's coming.... Threads are here, they are ugly, and we're growing out of them pretty fast. Do you not see the contradiction in those two statements? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply]
Re^2: System Performance by sgt (Deacon) on Aug 23, 2007 at 10:06 UTC
just curious, can't you just test processes, I mean like this (from your favourite shell say) on unixlike `% cat run for i in {1..$1} do cpu-intensive-task & done wait % for t in 100 200 300; do time run $t; done # and plot the results` [download] cheers --stephan	[reply] [d/l]
Re^3: System Performance by BrowserUk (Patriarch) on Aug 23, 2007 at 10:31 UTC
For performance testing, yes. Absolutely. My point about threading is simply that if you have a 32-way processor, unless you are constantly running 32+ separate tasks, you're wasting some of that power. However, if some or all of your your less than 32 concurrent tasks are set up for threading, then they will benefit (a little or a lot) whenever there are less than 32 tasks running. And there are many tasks, like the OPs primes algorithm and the other examples I cited, that do not lend themselves to being multi-tasked through forking, because they need access to shared data. All the problems with threading lie with the nature of the low-level abstractions for controlling shared memory access. The language that makes that easier, preferably transparent, will clean up in the future. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply]
Re: System Performance by jbert (Priest) on Aug 23, 2007 at 11:08 UTC
What you should always do is test something which is as similar as possible to what you care about. That's all it comes down to really. In this particular case, the big-iron machine is slower than your desktop for a single-threaded load (as others have pointed out). If what you care about is a non-parellisable problem, your home machine will run it faster than the big box. If what you care about is producing numbers which show your big box is fast (which is fair enough, it's fun to play with these things), then do what the others mentioned and run multiple cpu crunchers and aggregate the results. But this is only CPU, of course. If your problem has a 2Gbyte dataset, the desktop has 1Gb RAM and the big box has plenty, then you'll see a big difference depending on how much RAM you use in your performance test. So...perf comparisons between hardware come down to how accurately you can model the load you care about. Which is generally limited to how well you understand it and how well you can reproduce 'real world' conditions. (A thousand users on slow modems nibbling away at your app, plus 10% on screaming broadband can produce a very different load to 100 looping procs on another box on your LAN).	[reply]
Re: System Performance by grinder (Bishop) on Aug 23, 2007 at 13:19 UTC
I'd stake a beer or two on one area where your IBM big iron will walk all over your piddly desktop and that is in raw throughput. Try setting up a dozen processes running in parallel, reading and writing files on the disk. For instance, each process writes a few million records to 20 files opened simultaneously. Then go back and read all files simulatenously (that is, one record from each open file before getting the next), and write those records out to another file. Then delete everything and start again. Do that about 10 times. See how long it takes from the first process launched until all are finished. The idea is to saturate the IO channel on the machine. If my theory is right, your IBM machine will do much better at dealing with IO, and will finish miles ahead of your desktop. • another intruder with the mooring in the heart of the Perl	[reply]


No such thing as a small change
	PerlMonks