in reply to System Performance

The problem is that whilst the Regatta has 32 processors and a huge memory bandwidth, your single tasking benchmark will only utilise a tiny fraction of all that power. It will only ever run on one of the POWER4 processors which were introduced in 2001 and run at a now stately 1.1 or 1.3 Ghz.

There is a whole sub-industry devoted to constructing, maintaining and running specialised benchmarks for this kind of hardware to generate headline grabbing numbers. If you could run one of those on your home machine, it would fair very badly by comparison, despite being (I'm guessing) two or three Moore's generations newer hardware.

Generalised hardware benchmarks are generally pretty useless. A slower (Ghz) machine with a top of the range video card will out perform a faster machine with a bad one. Benchmarks only work to the extent that they reflect the realistic operations that they are a substitute for.

And a simple, single-tasking Perl script doing a little repetitive math won't begin to exercise the potential of even a dual-core or dual processor system, let alone the kind of two-cores per die, 4 dies to a card, 4-cards to a box machine like the p690 with its "variable frequency 'distributed switch', wave-pipelined expansion bus".

Its another indication that the future is multi-tasking and that languages that rely on the programmer to partition their algorithms, and use fork and pipes or sockets to distribute and coalesce the data, are doomed to disappear.

Take your prime sieve as an example. It is almost impossible to distribute the processing of a sieve across processors using fork. But its easy to set multiple threads running, that increment the shared candidate counter and then scan the shared sieve array 'striking off' multiples of their candidate.

Not a convicing example? Then consider manipulating very large digital images, say digital X-rays or CAT or PET scans. Or searching and matching huge strings like genome sequences.

Threading is coming, like it or not. It's just a matter of which languages are going to make using them, easiest.

(And sorry for hijacking your question to grind my axe :)


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^2: System Performance
by Eyck (Priest) on Aug 23, 2007 at 07:18 UTC
    threads are known to work well only on middle-ground 2-4 cpu machines, later they start hitting memory locking wall, and when it comes to things "coming", most of the industry thinks that things like transaction memory is the thing that's coming.

    Not to mention that when you look at efficient multi-cpu systems outthere you start noticing designs like erlang ( granted, what they're using is lightweight threads, but that's not the power of their solution)

    Threads are here, they are ugly, and we're growing out of them pretty fast.

      ... most of the industry thinks that things like transaction memory is the thing that's coming.... Threads are here, they are ugly, and we're growing out of them pretty fast.

      Do you not see the contradiction in those two statements?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re^2: System Performance
by sgt (Deacon) on Aug 23, 2007 at 10:06 UTC

    just curious, can't you just test processes, I mean like this (from your favourite shell say) on unixlike

    % cat run for i in {1..$1} do cpu-intensive-task & done wait % for t in 100 200 300; do time run $t; done # and plot the results
    cheers --stephan

      For performance testing, yes. Absolutely.

      My point about threading is simply that if you have a 32-way processor, unless you are constantly running 32+ separate tasks, you're wasting some of that power. However, if some or all of your your less than 32 concurrent tasks are set up for threading, then they will benefit (a little or a lot) whenever there are less than 32 tasks running.

      And there are many tasks, like the OPs primes algorithm and the other examples I cited, that do not lend themselves to being multi-tasked through forking, because they need access to shared data.

      All the problems with threading lie with the nature of the low-level abstractions for controlling shared memory access. The language that makes that easier, preferably transparent, will clean up in the future.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.