I'm of the opinion that the added understanding gained by profiling is worth it even in simple cases. At the point where you are trying to speed it up, even educated guessing is often wrong. And Devel::NYTProf (which I linked to above) does both line-level and block-level profiling, so it would directly say that the problem is that you are running these loops too often. (Leading the optimizer to start looking for ways to run them less, hopefully.)
When looking at profile output, there is a temptation to focus on individual lines I'll admit; but it can and does help just as much on a broader scale, as long as the programmer resists that temptation.