Re^2: [OT] The Long List is Long resurrected

> What I really like is that your code is mostly standard C++ that would appear to just work on just about any modern hardware. Is that right?

Replacing std::mutex with spinlock_mutex resolved the issue with nvc++ taking over 40 seconds to build llil4hmap.cc and llil4emh.cc. Though peculiar, I reached out to NVIDIA about it. All is well otherwise. The binary performs similarly to clang++.

> Would your C++ code automatically scale when run on a beast GPGPU machine with, say, six high end NVIDIA graphics cards?

Regarding llil4map/vec, the first step is replacing the OpenMP directives with the C++17 (or higher), parallel and vector concurrency via execution policies. Get properties may not run on the GPU. But no reason why sorting cannot. However, the GPU may lack sufficient memory capacity for the LLiL challenge.

From the artical, "The GPU version also uses the parallel execution policy, but is compiled with nvc++ and the -stdpar compiler option."

Accelerating Standard C++ with GPUs Using stdpar
Accelerating Python on GPUs with nvc++ and Cython
NVIDIA HPC Standard Language Parallelism, C++ PDF document