When I was updating the slightly-outdated PDL::Dataflow doc recently, it occurred to me that
Perl is single-threaded and there are various ways to get round that, but no amazing ones
PDL's current multi-core functionality (which is indeed shared-memory, using POSIX threads) freezes the main thread until all the broadcasted operations are finished
it doesn't use the GPU at all yet
so long as there's only one POSIX thread running Perl, there's no reason PDL couldn't fire off other pthreads (or, in due course, GPU operations) then "await" (pun intended) / react to the completion of those in an async fashion with a suitable event loop
That would be a lot closer to proper SMP, within Perl. My limited knowledge of parallel programming suggests that it's hard to reason about unless you have a "main" thread that's in overall charge, which this model would retain.