The main part of the program (that serves an actual request) has 'issues'.
It's a web-based frontend to a DBM database. Thus, concurrent write access demands a single process locks the DBM file, and gives up it's lock after.
While the locking is working, performance of the software for multiple users is many many times worse than one would expect, given performance when there is no contention is excellent.
I suspect there may be some sort of lock thrashing or similar. I need to profile this main part of the program to determine what's going on.
The profilers I've seen all want to write to a single file, which is useless for the forked program, since it will be stomped by subsequent invocations. | [reply] |
Ah, so I assume you want to profile your program to see how much time it spends waiting for a lock? One way you could have a go at this, given that your profiling tools only reliably work on one process, is to run two instances of your server, but accessing the same DBM file. Of course, this assumes the locking is done externally from the program (for example with a lock file instead of a semaphore in shared memory or some such beastie).
Assuming this is the case, than you run one instance of your server and put load on it until it starts to slow down. You then run another instance, on a different port presumably, and limit that one to one connection only (either by connecting only once or putting limitations in the forking code, heck, even by not forking at all). You can then reliably profile that server's execution. As long as you generate load on the other server instance, and hence generate contention for the lock, you should get a reliable answer from this as to whether your program spends most of its time waiting for the lock.
CU Robartes-
| [reply] |
Locking a DBM file in that manner is going to be problematic. You're always going to run into problems - starvation, for example, where a process ends up waiting a very long time for a lock to be freed, because you have no queue ordering.
Perhaps a solution to look up would be to have one 'thread' (fork, whatever ;) access the DBM file on behalf of the other processes: it could just lock the file, and then access it for the other threads, so immediately you gain from removing the startup cost of tieing the DBM. Another gain can then be made by 'queueing' requests - you could have an in-memory shared queue object, or perhaps a file FIFO. Obviously, you still need to lock access to this object, but since you'll be in and out of that object reasonably quickly it won't affect you as badly as something like a DBM file, and you also rule out problems like starvation.
Obviously, I'm simply outlining something which is actually fairly complicated, but generally the fork on request model doesn't work very well in terms of scaling. You more often see the helper-thread model, which tends to scale a bit better. Given only one process can have access to the file at a time, it makes much more sense to only have one process access it :) Having that process write the answer back to the client is fairly easy, and the helper threads would make sure all requests are queued in a timely fashion.
The other answer is to move to an RDBMS :o)
| [reply] |
| [reply] |
If your web front-end is triggering writes to the DBM file,
and if the quantity of additions/updates is significant,
then the problem may be in the DBM module. You could try
benchmarking just that part of the application, with or
without multiple threads but making sure to simulate a
reasonably heavy load of data to be absorbed. (E.g. I know that GDBM really
crawls once you start adding data beyond a certain threshold,
I think because it has to re-write its entire index at
intervals.)
But on the other hand, maybe moving to an RDBMS needn't be
so far off as you seem to think -- MySql won't be
that hard to install, and getting it working within your
current perl/web framework might be easier than you expect.
That's worth looking at, seriously.
| [reply] |
| [reply] |