I second this approach. Run the script and just output a header and a simple "Hello world" message, then add the bit pulling your header from the file, then add a bit opening and displaying a small, one-line file, then run your current implementation. Those data points will help you zero in on where the performance problem is.
If you get some information from this approach, you could post back with details.
Question: Any reason you're not using CGI.pm to print your HTML header?