That's something I insist upon when I'm da boss: Don't optimize until you know where the time is spent.
I just recently brought a long-running process down from 2+ hours per invocation to just over five minutes per, using aggressive caching (and lots of RAM!) but I would have guessed wrong had I not profiled first. It turned out that the chunk I thought would be the biggest CPU hog was actually number six on the list, and not worth fooling with.