Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re^5: (OT) Programming languages for multicore computers

by BrowserUk (Patriarch)
on May 07, 2009 at 23:40 UTC ( [id://762720]=note: print w/replies, xml ) Need Help??


in reply to Re^4: (OT) Programming languages for multicore computers
in thread (OT) Programming languages for multicore computers

Intel's first true dual core chip was released April, 2005. AMD's was released in May, 2009.

Hm. Your dates are wrong. AMD Athlon-64 X2 dual-core was released on May 31, 2005. And their Phonem X3 and X4 in 2007.

But that belies that both companies were integrating FPUs onto the dies way back when (486 circa. 1989).

And they've been utilising the extra transistors that have become available through process shrink (eg. 65nm to 45nm to 32nm), for all sorts of things. The number of levels and sizes of on-chip caches over the last many years. And most recently as I mentioned above, high-speed, low-latency, point-to-point, serial/parallel bus technologies--effectively an on-chip, high speed LAN.

Also, for the last few years Intel have followed their well-known tick-tock strategy. On the tick cycle, they do a process shrink on the current micro-architecture. On the tock-cycle, they release a new micro-architecture. With Nehalem being the latest tock, the next cycle will be another process shrink during which they'll be looking to increase clock rates and/or reduce power consumption.

(And that's another use for the extra transistors I forgot to mention; the incorporation of stage-by-stage clock-frequency and shut-down circuitry allowing them to slow-down or switch-off large chunks of the chips independently to conserve power.)

Take all those factors into account and I think you'll see that your no-of-cores projections, even your revised numbers, are still way optimistic. I think we might reach 32-core processors at the high-end over the next 5 to 6 years. Just. But it'll take another cycle at least for them to reach the commodity box. And push back the 1k-cores to the 2030 time frame.

Windows and Linux both scale to 32 CPUs fairly well. Operating systems will appear to scale for the next 5 years, albeit with diminishing returns. The real challenges come in the following 5 years.

Hm. I cannot talk for Linux, but talking about "the Windows OS" in singular terms is a mistake.

Vista is a significantly different animal to XP. And I'm not talking about Aero or similar cosmetic differences. I mean deep down in the guts of the kernel, where for example, the scheduler that previously used fixed time-based quanta as the basis of the round-robin scheduling, now uses instruction cycle counting to more fairly apportion the CPU(s).

Then there are beast like Windows HPC, which use radically different kernel structures than the desktop, or even general purpose Windows Server variants.

And Windows 7 has already incorporated VM technology--albiet for somewhat dubious reasons--into a main stream, desktop product. And everyday there is a new announcement by one of the various players in the Virtual Server marketplace of their latest greatest twist on the theme.

I occasionally need to access websites that simply do not work unless Active-X is enabled. Previously, I would totally avoid such sites, but these days, I have a copy of XP installed in a VirtualPC with neither LAN nor real harddisk access configured specifically for browsing this kind of website. A virus scanner runs over the relatively small virtual harddisk each time I boot the image--and if it ever detects anything it cannot remove, I just dump the Virtual disk and deploy a copy. Effectively it is a single application virtual OS. It's not hard to see how this technique will be integrated deep into Windows 8.

(I'm aware that the better supported Linux distributions come in several flavours--I'm just not equipped to comment on them>)

When I was talking about scalability limits, I was not talking about the scalability limits of the algorithms, but instead about the scalability limits of conventional operating systems

Basically what I'm saying is that it is naive to predict the future on the basis of hyper-evolution of the hardware, whilst simultaneously having software development stagnate. There are already strong indications of the ways in which OSs are going to evolve. And hypervisors--with all their abilities to segregate and route disk and network IO; even that originating from multiple concurrent OSs--are beginning to to move under the OS, closer to the hardware.

In addition, if we can have SPUs for FP, graphics and audio, why not dedicate one core per chip to IO? You'll remember the little mention 80186/80188 variants of the x86 architecture, with their specialist IO (ins/outs) instructions and extra IO dedicated hardware. Throw one of those on the chip as one of the 8 cores and give it sole responsibility for external IO. Have it reading and writing directly to physical memory pages in that are in a range of the 64-bit physical address space dedicated to that purpose. But map those physical pages into the virtual address spaces of the processes doing the IO. Have dedicated IO threads in each process. You've now removed DMA contention between the GPUs and main memory and the IOPU and the IO channels. This is not a new idea. Mainframes had dedicated IO processors offloading that workload from the main CPUs decades ago. Eg. The IBM 3174 and 3274 Control Units for the 370 processors.

You cannot achieve that address bus contention reduction using single threaded processes!

However straightforwardly multi-threaded programs are not going to be magically exempt from scalability issues...

I never said they would be. But why do we have to confine ourselves to "straightforwardly multi-threaded programs"?

When 2 threads that are remote from each other both access a piece of memory which needs to look consistent between them, there has to be locking under the hood at some level to make that happen.

Firstly, that "When", should be "If". Whilst there might be large volumes of notionally "shared state" within a threaded application, with the right algorithms and partitioning, very little of it actually needs to be accessed concurrently by multiple threads.

Even at the OS level, shared state tends to be used serially. Take IO buffers as an example. First the process writes the buffer--it then transfers it to the OS for output; or the OS fills a buffer from disk or network and then hands it over to the process.

And existing OSs are already adept at managing the consistency of duplicate memory images--think cache coherency--mostly using hardware assist. There's no reason that similar mechanisms cannot be used to protect a process' per-thread storage, one from the other.

With traditional multi-threaded programs, there are no guarantees on any part of memory being shared, and this leads to complications.

The second main theme of my original post, was dedicated to explaining how languages need to evolve to support the programmer in the use of threading. And that the primary way in which they need to evolve, is to provide compiler and runtime enforced (possibly with hardware assist, clear delineation between thread-local, non-shared variables, (in Perl's terms "lexicals"), and explicitly shared global variables.

threads already does this, but then screws it all up by insisting on automatically cloning vast chunks of stuff whether you need it or not. Haskell and other FP languages, by their very nature, already ensure that local variables are 'thread-safe'. But in the process they either take away access to shared state completely, or over-complicate its use through dogmatic design.

It's not that hard to see the possibility for a language to be designed from the ground up with threading in mind, to provide the best of both worlds: compiler-enforced, non-shared, thread local variables spaces and explicitly programmer controlled, explicitly-shared, global variable spaces.

Combine those two things with some of the new threading constructs I outlined, along with algorithms explicitly adapted to avoid locking and synchronisation, and you end up with multi-threaded programming that is no more complex than are currently quite common place in process based concurrency--ie. file sharing and locking; DB transactions and the like. With all the benefits of in-process shared state accessibility and performance.

it may take 20 years for it to become widely understood that massively multi-threaded systems don't scale that well. And when that happens I'll be able to say, "I knew that ages ago!"

I assume that like me, you are just an informed observer with no special insights to what the hardware fabs or software labs have on their drawing boards. We are both trying to look at history, and the current state of things, and use our experiences to date to predict where things are going to go. We differ in our conclusions, but that may be influenced by our differing backgrounds. Maybe the reality will be somewhere between us.

One thing I do hope is that this place will still be around so that we can come back here and see how close--or not--we got; in 5, 10, 15 & 20 years time. Flesh willing.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
  • Comment on Re^5: (OT) Programming languages for multicore computers

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://762720]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (3)
As of 2024-04-24 04:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found