in reply to speed issues

I'm not yet completely groking your pseudo code, but it appears that you are doing way more work "finding" files than you need to do. Why do you need to find things 3 separate times? Oops, make that 4. If the directories are very large (many many files), each find is going to be very expensive. So, suggestion number one is to figure out a better way of getting at your data without having to search through several large directories multiple times.

It also appears that you may be iterating over the same information multiple times. You're getting a list of sites first, then counting how many there were? Count them as you find them if that's actually what's happening. Thus, suggestion two is to figure out how to iterate over the data once to both get the info you need and process it at the same time.

-Scott