Thanks for the Inline C approach. That showcases one of Perl's strengths: to be able to dive down lower-level to C (via XS) and glue it back to Perl-land if we need the extra performance, all with minimal effort.
I'm curious - how long does my Perl primes(3_000_000) take on your machine?