I was lucky and discovered NTYProf quite early, so along with loop testing, I was able to get rid of major bottlenecks during development, which was and is kind of excellent. There's some optimization tricks I was not aware of in that perlperf page so those should help too, I've been using microtimers in loops which achieve the same result but I'll check out some of the other optimization tools, thanks. As noted above, sadly, the improvements once I made both test versions fully apples to apples turned out to be roughly 'only' 2.5x faster for tr and length vs regex. But equally obviously, anything that results in that big of a difference is worth understanding better, since usually you hope for 5, 10% improvements, not 250%.