When I was updating the slightly-outdated PDL::Dataflow doc recently, it occurred to me that
- Perl is single-threaded and there are various ways to get round that, but no amazing ones
- PDL's current multi-core functionality (which is indeed shared-memory, using POSIX threads) freezes the main thread until all the broadcasted operations are finished
- it doesn't use the GPU at all yet
- so long as there's only one POSIX thread running Perl, there's no reason PDL couldn't fire off other pthreads (or, in due course, GPU operations) then "await" (pun intended) / react to the completion of those in an async fashion with a suitable event loop
That would be a lot closer to proper SMP, within Perl. My limited knowledge of parallel programming suggests that it's hard to reason about unless you have a "main" thread that's in overall charge, which this model would retain.