It just comes down to speed. The real case applies some fairly complex logic to multiple large bit vectors (each 1m+ bits) which runs a lot faster in C. What I have works well, but as noted in the OP I'm unsure if my approach is risky vs Perl's internals.