Perhaps reframing the problem might give you some insight. It's virtually impossible, on a large system, to "pre-guess" what my performance issues are going to be. I might have an idea of how to improve the speed of a routine by 50%, but if it only takes 1% of the program's total execution time, then my programming time can probably be more profitably spent elsewhere.
Unless you have a known systemic issue (CAD rendering, heavy statistical analysis, etc.) then your time will usually be more profitably spent building the system so that it works to spec. If you have any areas that you suspect will be problematic, take the trouble to properly encapsulate their functionality so it's easier to rework them later. For example, with your hash, create accessor and mutator methods for it. That way, if you later find it too slow, you can take the trouble to re-engineer it but keep the same API. This will reduce that chance of your progam breaking (mind you, this is a good strategy, but doesn't always work in practice).
The important thing is to actually collect hard information to verify where your program is having performance issues. mod_perl is a great choice to improve overall performance, but the triple-nested loop will still scale poorly, regardless of mod_perl. What I would do is get your applications(s) running and then invite people to play it as a beta. Analyze your web logs and find out what programs are being accessed the most frequently (and perhaps parameters passed to them, if applicable). Then, you can use Devel::Dprof and Devel::Smallprof to really dig in and find out exactly what your most expensive sections of code are and work on fine-tuning those.
Hope this helps!
Cheers,
Ovid
Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.
| [reply] |
Also, any idea on how many people could be on a single dedicated server this way at one time before the server gets maxed out?
Completely unknown with the information you've given us. For example, the number of people who could use a single server if it was a P200 with 64Mb is completely different to the number of ppl who would saturate a 6 CPU, 2Gb E450. (you mention you use unix.)
One tip, mod_perl keeps database connections open. So the pain and suffering yours scripts go thru' to open a db-handle is done only once.
In this situation where you data must not be volatile (ie you should be able to recover after a machine failure) I think a database would be your best answer IMO. If tuned correctly, most of the data would be in memory, and the db engine will flush it when it needs to.
Instead of fragmenting your application into 100's of scripts check out CGI::Application. | [reply] |
It's not exactly set yet, but I suppose it will be something like a single Pentium 4 > 1GHz > 512MB RAM and > 50Gb HD running mod_perl under FreeBSD...
Also, under the hybrid scenario (temporary information -> shared memory & persistent information -> file IO) I don't think the temporary information in memory will be helpful after a server crash - everyone will time-out on the client side and have to restart anyway. Lets hope we dont crash often (are you listening FreeBSD?). Does this change your opinion on shared vs. DB? Thanks for the CGI::Application tip, I will check it out. -Dev
| [reply] |
| [reply] |
The Eagle Book says you can store small amounts of state information in main memory with IPC::Sharable.
"Small" is one of those wonderful words whose meaning changes as years go by. Friends claim to be stored 1Mb of stuff using IPC::Sharable with no noticeable performance hit (on a 512Mb box). (My first experience with shared memory was on a box with < 1 Mb of RAM. "Small" has come a way since then.)
| [reply] |
What I don't really like about IPC::Sharable is that it tends not to be a solution that scales really well,IMO. You can have a lot of IPC objects hanging around, but then there has to be some kind of way to manage them.
- Each has to have their own little handle.. (ipcs will show you what you've got)
- You sound like you're going to be doing something that may want to hold mulitple users. If you want to keep static information, how are you going to scale to more than one machine? Very tough to answer with IPC. RPC and a static daemon to hold this would be good, if you needed something more complicated than mySQL would provide.
- Shared memory is bad if you are only going to have one user get at a particular segment of the memory. There is no need for the overhead of the buildup and teardown of the handlers for that memory.
- Large state info, IMO, is best stored in a SQL table and torn down by a cron job every few minutes.
What you might want to look into for IPC-like functionality are mySQL tables of type HEAP. They will be fast to open, easy to delete from and they can scale to more than one machine.. sizable improvement over practical concerns of IPC on a webserver.
--jb
| [reply] |