Fascinating work marioroy!
What I really like is that your code is mostly standard C++ that would appear to just work on just about any modern hardware. Is that right?
I remember desperately grovelling around with all sorts of system specific hacks in The 10**21 Problem series -- such as pre-fetching, TLB, and Intel intrinsics (e.g. search for _mm_ in The 10**21 Problem (Part 4)) to get the performance I needed.
So it seems like a dream to write standard C++ that automatically performs on all modern hardware. For example, with NVIDIA's nvc++ compiler, would your C++ code automatically scale when run on a beast GPGPU machine with, say, six high end NVIDIA graphics cards?
👁️🍾👍🦟
In reply to Re: [OT] The Long List is Long resurrected
by eyepopslikeamosquito
in thread [OT] The Long List is Long resurrected
by marioroy
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |